• Now is really the golden age of the developer , especially the developers in China . We had always been behind the develop speed of the other area in the world . But now , we have the biggest mobile i
Now is really the golden age of the developer , especially the developers in China . We had always been behind the develop speed of the other area in the world . But now , we have the biggest mobile internet market in the world , and we have so many talented guys , and follow so close to the newest technology  . I think the Chinese developers can make a big difference in the mobile internet period .  Don't let sucn an opportunity slip , let's catch it !
I would also like to share a post , the title is also : THE GOLDEN AGE OF THE DEVELOPER
It is from : http://www.kernelmag.com/scene/2011/12/developers-developers-developers/
There’s never been a better time to be a developer, says The Kernel’s technical editor. But in exchange for all the resources laid out in front of you, what are you doing to give back?
There’s never been a better time to be a developer. Thanks to an unprecedented range of open-source software, learning resources and useful web services at our disposal, we can learn new languages, get help, collaborate with others and, if our ideas win traction, there’s now a multitude of investors waiting in the wings to help us build companies around our products.
This is not to say that our work is easy. Standards must remain high. But the resources available offer us the opportunity to move faster and make even more progress. The nature of innovation means that many of our ideas will not succeed, making determination vital for seeing ideas through. But the opportunity is here, my friends. We are the kingmakers.
The good news is that this golden age has made you the developer you are and will continue to help you. The even better news is that you have the chance now to, in the slightly emetic language of the Valley, “pay it forward”.
The first step is following. Developers who follow are vital simply because they use open-source software in their work. They also join mailing lists to keep up to date with the latest news, go along to local meet-ups and watch repos on Github. The number of followers attached to any technology gives an important indication of its popularity.
The next is contributing. Contributing code, ideas and experience all help to improve the ecosystem. At its simplest, this might involve sending a tiny pull request on Github or delivering a talk at a local meet-up. I say simplest, but doing either of these for the first time is terrifying. Once you’ve done it once, you realise it’s not so bad. And no contribution is too small. I only sent my first pull request recently and it consisted of tidying up some missing HTML attributes on Twitter’s Bootstrap. But I still had to summon all my courage before clicking Submit.
The big challenge is leading. It’s not as scary as it sounds. It’s really just an extension of contributing. If you make a significant contribution then you automatically become a leader. High profile leaders such as _why, DHH and notch are easy to point out, but many leaders aren’t as well known. Anyone who open sources a new project on Github, organises a local meet-up or writes a couple of well-trafficked blog posts lands in this category.
I only describe these levels for illustrative purposes. If you know which level you’re at then I encourage you to start thinking about how to move up the ladder. Whatever kind of developer you are, you’ll find it pays dividends. Levelling-up is extremely satisfying and helps improve your standing. My pull request to Twitter earned me a free lunch at their offices and a t-shirt. (Offer not guaranteed.)
The real benefit, though, is that moving the needle helps the entire system become smarter and faster. Everybody wins.
We hope The Kernel will play its part in this golden age of the developer. Over the coming months, we will be featuring more developer-focussed content from people like you. So if you’d like to take an in-depth look at a new technology, or pen an essay about an issue significant to you, then please fill out our contributor form.
Here’s to the future. 
展开全文
• Age of Empires” was one of the most famous real-time strategy games in that past years. Resource is one of the most important consideration when playing this game. In this game, there are four ...
• <p>I first tried deleting with JS, this worked on my local environment but when I pushed it up to a staging server the cookie wasnt being deleted. <p>Next I tired deleting it with ...
• IEEE、万方文献及国内外专利下载，请关注微信公众号... Journey to the Center of the Linux Kernel: Traffic Control, Shaping and QoS 1 Introduction 2 Motivation 3 The basics of Traff...
IEEE、万方文献及国内外专利下载，请关注微信公众号IEEE
Journey to the Center of the Linux Kernel: Traffic Control, Shaping and QoS
1 Introduction  2 Motivation  3 The basics of Traffic Control
3.1 First contact  3.2 Netfilter MARK  3.3 Two classes in a tree  3.4 Connecting the marks to the tree  4 Twenty Thousand Leagues Under the Code
4.1 Classless Disciplines
4.1.1 PFIFO_FAST  4.1.2 SFQ - Stochastic Fairness Queuing
4.1.2.1 SFQ Hashing algorithm  4.2 Classful Disciplines
4.2.1 TBF - Token Bucket Filter
4.2.1.1 DSL and ATM, the Ox that believed to be Frog  4.2.1.2 The limits of TBF  4.2.2 HTB - Hierarchical Token Bucket
4.2.2.1 Hysteresis and HTB  4.2.3 CoDel  4.2.4 FQ_CoDel  4.2.5 HFSC - Hierarchical Fair Service Curve  4.2.6 QFQ - Quick Fair Queueing  4.2.7 RED - Random Early Detection  4.2.8 CHOKe - CHOose and {Keep,Kill}  5 Shaping the traffic on the Home Network  6 A word about "BufferBloat"
6.1 What happens in the buffer ?
Journey to the Center of the Linux Kernel: Traffic Control, Shaping and QoS
— Julien Vehent - see revisions
1 Introduction
This document describes the Traffic Control subsystem of the Linux Kernel in depth, algorithm by algorithm, and shows how it can be used to manage the outgoing traffic of a Linux system. Throughout the chapters, we will discuss both the theory behind and the usage of Traffic Control, and demonstrate how one can gain a complete control over the packets passing through his system.

a QoS graph
The initial target of this paper was to gain a better control over a small DSL uplink. And it grew over time to cover a lot more than that. 99% of the information provided here can be applied to any type of server, as well as routers, firewalls, etc…
The Traffic Control topic is large and in constant evolution, as is the Linux Kernel. The real credit goes to the developers behind the /net directory of the kernel, and all of the researchers who created and improved all of these algorithms. This is merely an attempt to document some of this work for the masses. Any participation and comments are welcome, in particular if you spotted an inconsistency somewhere. Please email julien[at]linuxwall.info, your messages are always most appreciated.
For the technical discussion, since the LARTC mailing list doesn't exists anymore, try those two:
* Netfilter users mailing list for general discussions * NetDev mailing list] is where magic happens (developers ML)
2 Motivation

This article was initially published in the french issue of Gnu/Linux Magazine France #127, in May 2010. GLMF is kind enough to provide a contract that release the content of the article under Creative Common after some time. I extended the initial article quite a bit since, but you can still find the original french version here
My interest for the traffic shaping subsystem of Linux started around 2005, when I decided to host most of the services I use myself. I read the documentation available on the subject (essentially LARTC) but found it incomplete and ended up reading the original research publications and the source code of Linux.
I host most of my Internet services myself, at home or on small end servers (dedibox and so on). This includes web hosting (this wiki, some blogs and a few websites), SMTP and IMAP servers, some FTP servers, XMPP, DNS and so on. French ISPs allow this, but only give you 1Mbps/128KBps of uplink, which corresponds to the TCP Acks rate necessary for a 20Mbps downlink.
1Mbps is enough for most usage, but without traffic control, any weekly backup to an external location fills up the DSL link and slows down everyone on the network. During that time, both the visitors of this wiki and my wife chatting on skype will experience high latency. This is not acceptable, because the priority of the weekly backup is a lot lower than the two others. Linux give you the flexibility to shape the network traffic and use all of your bandwidth efficiently, without penalizing real-time applications. But this comes with a price, and the learning curve to implement an efficient traffic control policy is quite steep. This document provides an accurate and comprehensive introduction to the most used QoS algorithms, and gives you the tools to implement and understand your own policy.
3 The basics of Traffic Control

In the Internet world, everything is packets. Managing an network means managing packets: how they are generated, router, transmitted, reorder, fragmented, etc… Traffic Control works on packets leaving the system. It doesn't, initially, have as an objective to manipulate packets entering the system (although you could do that, if you really want to slow down the rate at which you receive packets). The Traffic Control code operates between the IP layer and the hardware driver that transmits data on the network. We are discussing a portion of code that works on the lower layers of the network stack of the kernel. In fact, the Traffic Control code is the very one in charge of constantly furnishing packets to send to the device driver.
It means that the TC module, the packet scheduler, is permanently activate in the kernel. Even when you do not explicitly want to use it, it's there scheduling packets for transmission. By default, this scheduler maintains a basic queue (similar to a FIFO type queue) in which the first packet arrived is the first to be transmitted.
At the core, TC is composed of queuing disciplines, or qdisc, that represent the scheduling policies applied to a queue. Several types of qdisc exist. If you are familiar with the way CPU schedulers work, you will find that some of the TC qdisc are similar. We have FIFO, FIFO with multiple queues, FIFO with hash and round robin (SFQ). We also have a Token Bucket Filter (TBF) that assigns tokens to a qdisc to limit it flow rate (no token = no transmission = wait for a token). This last algorithm was then extended to a hierarchical mode called HTB (Hierarchical Token Buket). And also Quick Fair Queuing (QFQ), Hierarchical Fair Service Curve (HFSC), Random Early Detection (RED), etc….
For a complete list of algorithm, check out the source code at kernel.org.
3.1 First contact
Let's skip the theory for now and start with an easy example. We have a web server on which we would like to limit the flow rate of packets leaving the server. We want to fix that limit at 200 kilo-bits per seconds (25 KB/s).
This set up is fairly simple, and we need three things:
a Netfilter rule to mark the packets that we want to limit  a Traffic Control policy  a Filter to bind the packets to the policy
3.2 Netfilter MARK
Netfilter can be used to interact directly with the structure representing a packet in the kernel. This structure, the sk_buff, contains a field called “__u32 nfmark” that we are going to modify. TC will then read that value to select the destination class of a packet.
The following iptables rule will apply the mark '80' to outgoing packets (OUTPUT chain) sent by the web server (TCP source port is 80).
# iptables -t mangle -A OUTPUT -o eth0 -p tcp --sport 80 -j MARK --set-mark 80
We can control the application of this rule via the netfilter statistics:
# iptables -L OUTPUT -t mangle -v
Chain OUTPUT (policy ACCEPT 74107 packets, 109M bytes)
pkts bytes target prot opt in  out  source   destination
73896  109M MARK   tcp  --  any eth0 anywhere anywhere    tcp spt:www MARK xset 0x50/0xffffffff

You probably noticed that the rule is located in the mangle table. We will go back to that a little bit later.

3.3 Two classes in a tree
To manipulate TC policies, we need the /sbin/tc binary from the **iproute** package (aptitude install iproute).

The iproute package must match your kernel version. Your distribution's package manager will normally take care of that.

We are going to create a tree that represents our scheduling policy, and that uses the HTB scheduler. This tree will contain two classes: one for the marked traffic (TCP sport 80), and one for everything else.
# tc qdisc add dev eth0 root handle 1: htb default 20
# tc class add dev eth0 parent 1:0 classid 1:10 htb rate 200kbit ceil 200kbit prio 1 mtu 1500
# tc class add dev eth0 parent 1:0 classid 1:20 htb rate 824kbit ceil 1024kbit prio 2 mtu 1500
The two classes are attached to the root. Each class has a guaranteed bandwidth (rate value) and an opportunistic bandwidth (ceil value). If the totality of the bandwidth is not used, a class will be allowed to increased its flow rate up to the ceil value. Otherwise, the rate value is applied. It means that the sum of the rate values must correspond to the total bandwidth available.
In the previous example, we consider the total upload bandwidth to be 1024kbits/s, so class 10 (web server) gets 200kbits/s and class 20 (everything else) gets 824 kbits/s.

TC can use both kbit and kbps notations, but they don't have the same meaning. kbit is the rate in kilo-bits per seconds, and kbps is in kilo-bytes per seconds. In this article, I will the kbit notation only.

3.4 Connecting the marks to the tree
We now have on one side a traffic shaping policy, and on the other side packets marking. To connect the two, we need a filter.
A filter is a rule that identify packets (handle parameter) and direct them to a class (fw flowid parameter). Since several filters can work in parallel, they can also have a priority. A filter must be attached to the root of the QoS policy, otherwise, it won't be applied.
# tc filter add dev eth0 parent 1:0 protocol ip prio 1 handle 80 fw flowid 1:10
We can test the policy using a simply client/server setup. Netcat is very useful for such testing. Start a listening process on the server that applies the policy using:
# nc -l -p 80 < /dev/zero
And connect to it from another machine using :
# nc 192.168.1.1 80 > /dev/null
The server process will send zeros (taken from /dev/zero) as fast as it can, and the client will receive them and throw them away, as fast as it can.
Using iptraf to monitor the connection, we can supervise the bandwidth usage (bottom right corner).

The value is 199.20kbits/s, which is close enough to the 200kbits/s target. The precision of the scheduler depends on a few parameters that we will discuss later on.
Any other connection from the server that uses a source port different from TCP/80 will have a flow rate between 824kbits/s and 1024kbits/s (depending on the presence of other connections in parallel).
4 Twenty Thousand Leagues Under the Code
Now that we enjoyed this first contact, it is time to go back to the fundamentals of the Quality of Service of Linux. The goal of this chapter is to dive into the algorithms that compose the traffic control subsystem. Later on, we will use that knowledge to build our own policy.
The code of TC is located in the net/sched directory of the sources of the kernel. The kernel separates the flows entering the system (ingress) from the flows leaving it (egress). And, as we said earlier, it is the responsibility of the TC module to manage the egress path.
The illustration below show the path of a packet inside the kernel, where it enters (ingress) and where it leaves (egress). If we focus on the egress path, a packet arrives from the layer 4 (TCP, UDP, …) and then enter the IP layer (not represented here). The Netfilter chains OUTPUT and POSTROUTING are integrated in the IP layer and are located between the IP manipulation functions (header creation, fragmentation, …). At the exit of the NAT table of the POSTROUTING chain, the packet is transmitted to the egress queue, and this is where TC starts its work.
Almost all the devises use a queue to schedule the egress traffic. The kernel possesses algorithms that can manipulate these queues, they are the queuing disciplines (FIFO, SFQ, HTB, …). The job of TC is to apply these queuing disciplines to the egress queue in order to select a packet for transmission.
TC works with the sk_buff] structure that represents a packet in the kernel. It doesn't manipulate the packet data directly. sk_buff is a shared structure accessible anywhere in the kernel, thus avoiding unnecessary duplication of data. This method is a lot more flexible and a lot faster because sk_buff contains all of the necessary information to manipulate the packet, the kernel therefore avoids copies of headers and payloads from different memory areas that would ruin the performances.

On a regular basis, the packet scheduler will wake up and launch any pre-configured scheduling algorithms to select a packet to transmit.
Most of the work in launch by the function dev_queue_xmit from net/core/dev.c, that only take a sk_buff structure as input (this is enough, sk_buff contains everything needed, such as skb→dev, a pointer to the output NIC).
dev_queue_xmit makes sure the packet is ready to be transmitted on the network, that the fragmentation is compatible with the capacity of the NIX, that the checksums are calculated (if this is handled by the NIC, this step is skipped). Once those controls done, and if the equipment has a queue in skb→dev→qdisc, then the sk_buff structure of the packet is added to this queue (via the enqueue function) and the qdisc_run function is called to select a packet to send.
This means that the packet that has just been added to the NIC's queue might not be the one immediately transmitted, but we know that it is ready for subsequent transmission the moment it is added to the queue.
To each device is attached a root queuing discipline. This is what we defined earlier when creating the root qdisc to limit the flow rate of the web server:
# tc qdisc add dev eth0 root handle 1: htb default 20
This command means “attach a root qdisc identified by id 1 to the device eth0, use htb as a scheduler and send everything to class 20 by default”.
We will find this qdisc at the pointer skb→dev→qdisc. The enqueue and dequeue functions are also located there, respectively in skb→dev→qdisc→enqueue() and skb→dev→qdisc→dequeue(). This last dequeue function will be in charge of forwarding the sk_buff structure of the packet selected for transmission to the NIC's driver.
The root qdisc can have leaves, known as classes, that can also possess their own qdisc, thus constructing a tree.
4.1 Classless Disciplines
This is a pretty long way down. For those who wishes to deepen the subject, I recommend reading « Understanding Linux Network Internals », from Christian Benvenuti at O'reilly.
We now have a dequeue function which role is to select the packet to send to the network interface. To do so, this function calls scheduling algorithms that we are now going to study. There are two types of algorithms: Classless and Classful. The classful algorithms are composed of qdisc that can contain classes, like we did in the previous example with HTB. In opposition, classless algorithms cannot contain classes, and are (supposedly) more simple.
4.1.1 PFIFO_FAST
Let's start with a small one. pfifo_fast is the default scheduling algorithm used when no other is explicitly defined. In other words, it's the one used on 99.9% of the Linux systems. It is classless and a tiny bit more evolved than a basic FIFO queue, since it implements 3 bands working in parallel. These bands are numbered 0, 1 and 2 and emptied sequentially: while 0 is not empty, 1 will not be processed, and 2 will be the last. So we have 3 priorities: the packets in queue 0 will be sent before the packets in queue 2.
The kernel uses the Type of Service field (the 8 bits fields from bit 8 to bit 15 of the IP header, see below) to select the destination band of a packet.
    0                   1                   2                   3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Version|  IHL  |Type of Service|          Total Length         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|         Identification        |Flags|      Fragment Offset    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  Time to Live |    Protocol   |         Header Checksum       |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Example Internet Datagram Header from RFC 791
This algorithm is defined in **net/sched/sch_generic.c** and represented in the diagram below.

dia source
The length of a band, representing the number of packet it can contain, is set to 100 by default and defined outside of TC. It's a parameter that can be set using ifconfig, and visualized in /sys:
# cat /sys/class/net/eth0/tx_queue_len
1000
Once the default value of 1000 is passed, TC will start dropping packets. This should very rarely happen because TCP makes sure to adapt its sending speed to the capacity of both systems participating in the communication (that's the role of the TCP slow start). But experiments showed that increased that limit to 10,000, or even 100,000, in some very specific cases of gigabits networks can improve the performances. I wouldn't recommend touching this value unless you really now what you are doing. Increasing a buffer size to a too large value can have very negative side effect on the quality of a connection. Jim Gettys called thatbufferbloat, and we will talk about it in the last chapter.
Because this algorithm is classless, it is not possible to plug another scheduler after it.
4.1.2 SFQ - Stochastic Fairness Queuing
Stochastic Fairness Queuing is an algorithm that shares the bandwidth without giving any privilege of any sort. The sharing is fair by being stochastic, or random if you prefer. The idea is to take a fingerprint, or hash of the packet based on its header, and to use this hash to send the packet to one of the 1024 buckets available. The buckets are send emptied in a round-robin fashion.
The main advantage of this method is that no packet will have the priority over another one. No connexion can take over the other, and everybody has to share. The repartition of the bandwidth across the packets will almost always be fair, but there are some minor limitation. The main limitation is that the hashing algorithm might produce the same result for several packets, and thus send them to the same bucket. One bucket will then fill up faster than the other, breaking the fairness of the algorithm. To mitigate this, SFQ will modify the parameters of its hashing algorithm on a regular basis, by default every 10 seconds.
The diagram below shows how the packets are processed by SFQ, from entering the scheduler at the top, to being dequeued and transmitted at the bottom. The source code is available in net/sched/sch_sfq.c. Some variables are hard coded in the source code: * SFQ_DEFAULT_HASH_DIVISOR gives the number of buckets and default to 1024 * SFQ_DEPTH defines the depth of each bucket, and defaults to 128 packets
#define SFQ_DEPTH		128 /* max number of packets per flow */
#define SFQ_DEFAULT_HASH_DIVISOR 1024
These two value determine the maximum number of packets that can be queued at the same time. 1024 * 128 = 131072 packets can be waiting in the SFQ scheduler before it starts dropping packets. But this theoritical number is very unlikely to ever happen, because it would require of the algorithm to fill up every single bucket evenly before starting to dequeue or drop packets.
The tc command line lists the options that can be fed to SFQ:
# tc qdisc add sfq help
Usage: ... sfq [ limit NUMBER ] [ perturb SECS ] [ quantum BYTES ]
limit: is the size of the buckets. It can be reduced below SFQ_DEPTH but not increased past it. If you try to put a value above 128, TC will simply ignore it.  perturb: is the frequency, in seconds, at which the hashing algorithm is updated.  quantum: represents the maximum amount of bytes that the round-robin dequeue process will be allow to dequeue from a bucket. At a bare minimum, this should be equal to the MTU of the connexion (1500 bytes on ethernet). Imagine that we set this value to 500 bytes, all packets bigger than 500 bytes would not be dequeued by SFQ, and would stay in their bucket. We would, very quickly, arrive at a point where all buckets are blocked and no packets are transmitted anymore.
DIA source
4.1.2.1 SFQ Hashing algorithm
The hash used by SFQ is computed by the sfq_hash function (see the source code), and take into account the source and destination IP addresses, the layer 4 protocol number (still an IP header), and the TCP ports (if the IP packet is not fragmented). Those parameters are mixed with a random number regenerated every 10 seconds (the perturb value defines that).
Let's walk through a simplified version of the algorithm steps. The connexion has the following parameters:
source IP: 126.255.154.140  source port: 8080  destination IP: 175.112.129.215  destination port: 2146
/*IP source address in hexadecimal*/
h1 = 7efffef0

h2 = af7081d7

/* 06 is the protocol number for TCP (bits 72 to 80 of the IP header)
We perform a XOR between the variable h2 obtained in the previous step
and the TCP protocol number
*/
h2 = h2 XOR 06

/* if the IP packet is not fragmented, we include the TCP ports in the hash*/
/* 1f900862  is the hexadecimal representation of the source destination ports
We perform another XOR with this value and the h2 variable
*/
h2 = h2 XOR 1f900862

/* And finally, we use the Jenkins algorithm with some additional "golden numbers"
This jhash function is defined somewhere else in the kernel source code*/
h = jhash(h1, h2, perturbation) 
The result obtained is a hash value of 32 bits that will be used by SFQ to select the destination bucket of the packet. Because the perturb value is regenerated every 10 seconds, the packets from a reasonably long connexion will be directed to different buckets over time.
But this also means that SFQ might break the sequencing of the sending process. Because if two packets from the same connexion are placed in two differents buckets, it is possible that the second bucket will be processed before the first one, and therefore sending the second packet before the first one. This is not a problem for TCP, which uses sequence number to reorder the packets when they reach their destination, but for UDP it might be.
For example, imagine that you have a syslog daemon sending logs to a central syslog server. With SFQ, it might happen that a log process arrives before the log that precedes it. If you don't like it, use TCP.
TC gives us some more flexibility on the hashing algorithm. We can, for example, modify the fields considered by the hashing process. This can be used using TC filters, as follow:
# tc qdisc add dev eth0 root handle 1: sfq perturb 10 quantum 3000 limit 64
# tc filter add dev eth0 parent 1:0 protocol ip handle 1 flow hash keys src,dst divisor 1024
The filter above simply the hash to keep only the source and destination IP addresses as input parameters. The divisor value is the number of buckets, as seen before. We could, then, create a SFQ scheduler that works with 10 buckets only and considers the IP addresses of the packets in the hash.
This discipline is classless as well, which means we cannot direct packet to another scheduler when they leave SFQ. Packets are transmitted to the network interface only.
4.2 Classful Disciplines
4.2.1 TBF - Token Bucket Filter
Until now, we looked at algorithm that do not allow to control the amount of bandwidth. SFQ and PFIFO_FAST give the ability to smoothen the traffic, and even to prioritize it a bit, but not to control its throughput.
In fact, the main problem when controlling the bandwidth is to find an efficient accounting method. Because counting in memory is extremely difficulty and costly to do in real-time, computer scientists took a different approach here.
Instead of counting the packets (or the bits transmitted by the packets, it's the same thing), the Token Bucket Filter algorithm sends, at a regular interval, a token into abucket. Now this is disconnected from the actual packet transmission, but when a packet enters the scheduler, it will consume a certain number of tokens. If there is not enough tokens for it to be transmitted, the packet waits.
Until now, with SFQ and PFIFO_FAST, we were talking about packets, but with TBF we now have to look into the bits contained in the packets. Let's take an example: a packet carrying 8000 bits (1KB) wishes to be transmitted. It enters the TBF scheduler and TBF control the content of its bucket: if there are 8000 tokens in the bucket, TBF destroys them and the packet can pass. Otherwise, the packet waits until the bucket has enough tokens.
The frequency at which tokens are added into the bucket determine the transmission speed, or rate. This is the parameter at the core of the TBF algorithm, shown in the diagram below.

DIA source
Another particularity of TBF is to allow bursts. This is a natural side effect of the algorithm: the bucket fills up at a continuous rate, but if no packets are being transmitted for some time, the bucket will get completely full. Then, the next packets to enter TBF will be transmitted right away, without having to wait and without having any limit applied to them, until the bucket is empty. This is called aburst, and in TBF the burst parameter defines the size of the bucket.
So with a very large burst value, say 1,000,000 tokens, we would let a maximum of 83 fully loaded packets (roughly 124KBytes if they all carry their maximum MTU) traverse the scheduler without applying any sort of limit to them.
To overcome this problem, and provides better control over the bursts, TBF implements a second bucket, smaller and generally the same size as the MTU. This second bucket cannot store large amount of tokens, but its replenishing rate will be a lot faster that the one of the big bucket. This second rate is called peakrate and it will determine the maximum speed of a burst.
Let's take a step back and look at those parameters again. We have:
peakrate > rate : the second bucket fills up faster than the main one, to allow and control bursts. If the peakrate value is infinite, then TBF behaves as if the second bucket didn't exist. Packets would be dequeued according to the main bucket, at the speed of rate.  burst > MTU : the size of the first bucket is a lot larger than the size of the second bucket. If the burst is equal to MTU, then peakrate is equal to rate and there is no burst.
So, to summarize, when everything works smoothly packets are enqueued and dequeued at the speed of rate. If tokens are available when packets enter TBF, those packets are transmitted at the speed of peakrate until the first bucket is empty. This is represented in the diagram above, and in the source code atnet/sched/sch_tbf.c, the interesting function being tbf_dequeue.c.
The configurable options of the TBF scheduler are listed in TC:
# tc qdisc add tbf help
Usage: ... tbf limit BYTES burst BYTES[/BYTES] rate KBPS [ mtu BYTES[/BYTES] ]
[ peakrate KBPS ] [ latency TIME ] [ overhead BYTES ] [ linklayer TYPE ]
We recognize burst, rate, mtu and peakrate that we discussed above. The limitparameter is the size of the packet queue (see diagram above). latency represents another way of calculating the limit parameter, by setting the maximum time a packet can wait in the queue (the size of the queue is then derivated from it, the calculation includes all of the values of burst, rate, peakrate and mtu). overheadand linklayer are two other parameters whose story is quite interesting. Let's take a look at those now.
4.2.1.1 DSL and ATM, the Ox that believed to be Frog
If you have ever read Jean De La Fontaine, you probably know the story of the “The frog that wanted to be as big as an ox”. Well, in our case, it's the opposite, and your packets might not be as small as they think they are.
If most local networks use Ethernet, up until recently a lot of communications inside (at least in Europe) were done over the ATM protocol. Nowadays, ISP are moving towarding all-over-ip, but ATM is still around. The particularity of ATM is to split large ethernet packets into much smaller ones, called “cells”. A 1500 bytes ethernet packet would be split into ~30 smaller ATM cells of just 53 bytes each. And from those 53 bytes, only 48 are from the original packet, the rest is occupied by the ATM headers.
So where is the problem ? Considering the following network topology.

The QoS box is in charge of performing the packet scheduling before transmitting it to the modem. The packets are then split by the modem into ATM cells. So our initial 1.5KB ethernet packets is split into 32 ATM cells, for a total size of 32 * 5 bytes of headers per cell + 1500 bytes of data = (32*5)+1500 = 1660 bytes. 1660 bytes is 10.6% bigger than 1500. When ATM is used, we lose 10% of bandwidth compared to an ethernet network (this is an estimate that depend on the average packet size, etc…).
If TBF doesn't know about that, and calculates its rate based on the sole knowledge of the ethernet MTU, then it will transmit 10% more packets than the modem can transmit. The modem will start queuing, and eventually dropping, packets. The TCP stacks will have to adjust their speed, traffic gets erratic and we lose the benefit of TC as a traffic shaper.
Jesper Dangaard Brouer did his Master Thesis on this topic, and wrote a few patchs for the kernel and TC. These patchs implement the overhead andlinklayer parameter, and can be used to inform the TBF scheduler of the type of link to account for.
We can use these parameter to fine tune the creation of a TBF scheduling discipline:
# tc qdisc add dev eth0 root tbf rate 1mbit burst 10k latency 30ms linklayer atm

# tc -s qdisc show dev eth0
qdisc tbf 8005: root rate 1000Kbit burst 10Kb lat 30.0ms
Sent 738 bytes 5 pkt (dropped 0, overlimits 0 requeues 0)
rate 0bit 0pps backlog 0b 0p requeues 0
And by setting the linklayer to “atm”, we take into account an overhead of 5 bytes per cell later in the transmission, therefore preventing the modem from buffering the packets.
4.2.1.2 The limits of TBF
TBF gives a pretty accurate control over the bandwidth assigned to a qdisc. But it also imposes that all packets pass through a single queue. If a big packet is blocked because there is not enough tokens to send it, smaller packets - that could potentially be sent instead - are blocked behind it as well. This is the case represented in the diagram above, where packet #2 is stuck behind packet #1. We could optimize the bandwidth usage by allowing the smaller packet to be sent instead of the bigger one. We would, however, fall into the same problem of reordering packets that we discussed with the SFQ algorithm.
The other solution would be to give more flexibility to the traffic shaper, declare several TBF queues in parallel, and route the packets to one or the other using filters. We could also allow those parallel queues to borrow tokens from each other, in case one is idle and the other one is not.
We just prepared the ground for classful qdisc, and the Hierarchical Token Bucket.
4.2.2 HTB - Hierarchical Token Bucket
The Hierarchical Token Bucket (HTB) is an improved version of TBF that introduces the notion of classes. Each class is, in fact, a TBF-like qdisc, and classes are linked together as a tree, with a root and leaves. HTB introduces a number of features to improved the management of bandwidth, such as a the priority between classes, a way to borrow bandwidth from another class, or the possibility to plug another qdisc as an exit point (a SFQ for example).
Let's take a simple example, represented in the diagram below.

htb_en.dia.zip
The tree is created with the commands below:
# tc qdisc add dev eth0 root handle 1: htb default 20
# tc class add dev eth0 parent 1:0 classid 1:10 htb rate 200kbit ceil 400kbit prio 1 mtu 1500
# tc class add dev eth0 parent 1:0 classid 1:20 htb rate 200kbit ceil 200kbit prio 2 mtu 1500
HTB uses a similar terminology than TBF and SFQ:
burst is identical to the burst of TBF: it's the size of the token bucket  rate is the speed at which token at generated and put in the bucket, the speed of the leaf, like in TBF  quantum is similar to the quantum discussed in SFQ, it's the amount of bytes to serve from the leaf at once.
The new parameters are ceil and cburst. Let us walk through the tree to see how they work.
In the example above, we have a root “qdisc handle 1” and two leaves “qdisc handle 10” and “qdisc handle 20”. The root will apply filters to decide where a packet should be directed, we will discuss those later, and by default packets are sent to leaf #20 (default to 20).
The leaf #10 has a rate value of 200kbits/s, a ceil value of of 400kbibts/s (which means it can borrow 200kbits/s more that its rate) and a priority (prio) of 1.
The leaf #20 has a rate value of 200kbits/s, a ceil value of 200kbbits/s (which means it cannot borrow anything, rate == ceil) and a priority of 2.
Each HTB leaf will, at any point, have one of the three following status:
HTB_CANT_SEND : the class can neither send nor borrow, no packets are allowed to leave the class  HTB_MAY_BORROW : the class cannot send using its own tokens, but can try to borrow from another class  HTB_CAN_SEND : the class can send using its own tokens
Imagine a group of packets that enter TC and are marked with the flag #10, and therefore are directed to leaf #10. The bucket for leaf #10 does not contain enough tokens to let the first packets pass, so it will try to borrow some from its neighbor leaf #20. The quantum value of leaf #10 is set to the MTU (1500 bytes), which means the maximal amount of data that leaf #10 will try to send is 1500 bytes. If packet#1 is 1400 bytes large, and the bucket in leaf #10 has enough tokens for 1000 bytes, then the leaf will try to borrow the remaining 400 bytes from its neightbor leaf #20.
The quantum is the maximal amount of bytes that a leaf will try to send at once. The closer the value is from the MTU, the more accurate the scheduling will be, because we reschedule after every 1500 bytes. And the larger the value of quantum will be, the more a leaf will be privileged: it will be allowed to borrow more tokens from its neighbor. But of course, since the total amount of tokens in the tree is not unlimited, if a token is borrowed from a leaf, another leaf cannot use it anymore. Therefore, the bigger the value of quantum is, the more a leaf is able to steal from its neighbor. This is tricky because those neighbors might very well have packets to send as well.
When configuring TC, we do not manipulate the value of quantum directly. There is an intermediary parameter called r2q that calculates the quantum automatically based on the rate. quantum = rate / r2q. By default, r2q is set to 10, so for a rate of 200kbits, quantum will have a value of 20kbits.
For very small or very large bandwidth, it is important to tune r2q properly. If r2q is too large, too many packets will leave a queue at once. If r2q is too small, no enough packets are sent.

One important detail is that r2q is set on the root qdisc once and for all. It cannot be configured for each leaf separately.

TC offer the following configuration options for HTB:
Usage: ... qdisc add ... htb [default N] [r2q N]
default  minor id of class to which unclassified packets are sent {0}
r2q      DRR quantums are computed as rate in Bps/r2q {10}
debug    string of 16 numbers each 0-3 {0}

... class add ... htb rate R1 [burst B1] [mpu B] [overhead O]
[prio P] [slot S] [pslot PS]
[ceil R2] [cburst B2] [mtu MTU] [quantum Q]
rate     rate allocated to this class (class can still borrow)
burst    max bytes burst which can be accumulated during idle period {computed}
mpu      minimum packet size used in rate computations
ceil     definite upper class rate (no borrows) {rate}
cburst   burst but for ceil {computed}
mtu      max packet size we create rate map for {1600}
prio     priority of leaf; lower are served first {0}
quantum  how much bytes to serve from leaf at once {use r2q}
As you can see, we are now familiar to all of those parameters. If you just jumped to this section without reading about SFQ and TBF, please read those chapters for a detailed explanation of what those parameters do.

Remember that, when configuring leaves in HTB, the sum of the rate of the leaves cannot be higher than the rate of the root. It makes sense, right ?

4.2.2.1 Hysteresis and HTB
Hysteresis. If this barbarian word is not familiar to you, as it wasn't to me, here is how wikipedia defines it: “Hysteresis is the dependence of a system not just on its current environment but also on its past.”
Hysteresis is a side effect introduced by an optimization of HTB. In order to reduce the load on the CPU, HTB initially did not recalculate the content of the bucket often enough, therefore allowing some classes to consume more tokens that they actually held, without borrowing.
The problem was corrected and a parameter introduced to allow or block the usage of estimate in HTB calculation. The kernel developers kept the optimization feature simply because it can prove useful in high traffic networks, where recalculating the content of the bucket each time is simply not doable.
But in most cases, this optimization is simply deactivated, as shown below:
# cat /sys/module/sch_htb/parameters/htb_hysteresis
0
4.2.3 CoDel
http://queue.acm.org/detail.cfm?id=2209336
4.2.4 FQ_CoDel

4.2.5 HFSC - Hierarchical Fair Service Curve

4.2.6 QFQ - Quick Fair Queueing

4.2.7 RED - Random Early Detection

4.2.8 CHOKe - CHOose and {Keep,Kill}

5 Shaping the traffic on the Home Network
Home networks are tricky to shape, because everybody wants the priority and it's difficult to predetermine a usage pattern. In this chapter, we will build a TC policy that answer general needs. Those are:
Low latency. The uplink is only 1.5Mbps and the latency shouldn't be more than 30ms under high load. We can tune the buffers in the qdisc to ensure that our packets will not stay in a large queue for 500ms waiting to be processed  High UDP responsiveness, for applications like Skype and DNS queries  Guarantied HTTP/s bandwidth, half of the uplink is dedicated to the HTTP traffic (although, other classes can borrow from it) to ensure that web browsing, probably 80% of a home network usage, is smooth and responsive  TCP ACKs and SSH traffic get higher priority. In the age of Netflix and HD VoD, it's necessary to ensure fast download speed. And for that, you need to be able to send TCP ACKs as fast as possible. This is why those packets get a higher priority than the HTTP traffic.  A general class for everything else.
This policy is represented in the diagram below. We will use PFIFO_FAST and SFQ termination qdisc once we exit HTB to perform some additional scheduling (and prevent a single HTTP connection from eating all of the bandwidth, for example).

DIA source

get the bash script from github
Below is one of the section, in charge of the creation of the class for SSH. I have replaced the variables with their value for readability.
# SSH class: for outgoing connections to
# however, an SSH connection cannot fill up
# the connection to more than 70%

echo "#---ssh - id 300 - rate 160 kbit ceil 1120 kbit"
/sbin/tc class add dev eth0 parent 1:1 classid 1:300 htb \
rate 160kbit ceil 1120kbit burst 15k prio 3

# SFQ will mix the packets if there are several
# SSH connections in parallel
# and ensure that none has the priority

echo "#--- ~ sub ssh: sfq"
/sbin/tc qdisc add dev eth0 parent 1:300 handle 1300: \
sfq perturb 10 limit 32

echo "#--- ~ ssh filter"
/sbin/tc filter add dev eth0 parent 1:0 protocol ip \
prio 3 handle 300 fw flowid 1:300

echo "#--- ~ netfilter rule - SSH at 300"
/sbin/iptables -t mangle -A POSTROUTING -o eth0 -p tcp
--tcp-flags SYN SYN -dport 22 -j CONNMARK \
--set-mark 300
The first rule is the definition of the HTB class, the leaf. I connects back to its parent 1:1, defines a rate of 160kbit/s and can use up to 1120kbit/s by borrowing the difference from other leaves. The burst value is set to 15k, with is 10 full packets with a MTU of 1500 bytes.
The second rule defines a SFQ qdisc connected to the HTB one above. That means that once packets have passed the HTB leaf, they will pass through a SFQ leaf before being transmitted. The SFQ will ensure that multiple parallel connections are mixed before being transmitted, and that one connection cannot eat the whole bandwidth. We limit the size of the SFQ queue to 32 packets, instead of the default of 128.
The come the TC filter in the third rule. This filter will check the handle of each packet, or, to be more accurate, the value of nf_mark in the sk_buff representation of the packet in the kernel. Using this mark, the filter will direct SSH packet to the HTB leaf above.

Even though this rule is located in the SSH class block for clarity, you might have noticed that the filter has the root qdisc for parent (parent 1:0). Filters are always attached to the root qdisc, and not to the leaves. That makes sense, because the filtering must be done at the entrance of the traffic control layer.

And finally, the fourth rule is the iptables rule that applies a mark to SYN packets leaving the gateway (connection establishments). Why SYN packets only ? To avoid performing complex matching on all the packets of all the connection. We will rely on netfilter's capability to maintain stateful information to propagate a mark placed on the first packet of the connection to all of the other packets. This is done by the following rule at the end of the script:
echo "#--- ~ propagating marks on connections"
iptables -t mangle -A POSTROUTING -j CONNMARK --restore-mark
Let us now load the script on our gateway, and visualise the qdisc created.
# /etc/network/if-up.d/lnw_gateway_tc.sh start

#-cleanup
#-define a HTB root qdisc
#--uplink - rate 1600 kbit ceil 1600 kbit
#---interactive - id 100 - rate 160 kbit ceil 1600 kbit
#--- ~ sub interactive: pfifo
#--- ~ interactive filter
#--- ~ netfilter rule - all UDP traffic at 100
#---tcp acks - id 200 - rate 320 kbit ceil 1600 kbit
#--- ~ sub tcp acks: pfifo
#--- ~ filtre tcp acks
#--- ~ netfilter rule for TCP ACKs will be loaded at the end
#---ssh - id 300 - rate 160 kbit ceil 1120 kbit
#--- ~ sub ssh: sfq
#--- ~ ssh filter
#--- ~ netfilter rule - SSH at 300
#---http branch - id 400 - rate 800 kbit ceil 1600 kbit
#--- ~ sub http branch: sfq
#--- ~ http branch filter
#--- ~ netfilter rule - http/s
#---default - id 999 - rate 160kbit ceil 1600kbit
#--- ~ sub default: sfq
#--- ~ filtre default
#--- ~ propagating marks on connections
#--- ~ Mark TCP ACKs flags at 200

Traffic Control is up and running
# /etc/network/if-up.d/lnw_gateway_tc.sh show

---- qdiscs details -----
qdisc htb 1: root refcnt 2 r2q 40 default 999 direct_packets_stat 0 ver 3.17
qdisc pfifo 1100: parent 1:100 limit 10p
qdisc pfifo 1200: parent 1:200 limit 10p
qdisc sfq 1300: parent 1:300 limit 32p quantum 1514b flows 32/1024 perturb 10sec
qdisc sfq 1400: parent 1:400 limit 32p quantum 1514b flows 32/1024 perturb 10sec
qdisc sfq 1999: parent 1:999 limit 32p quantum 1514b flows 32/1024 perturb 10sec

---- qdiscs statistics --
qdisc htb 1: root refcnt 2 r2q 40 default 999 direct_packets_stat 0
Sent 16776950 bytes 125321 pkt (dropped 4813, overlimits 28190 requeues 0)
rate 0bit 0pps backlog 0b 0p requeues 0
qdisc pfifo 1100: parent 1:100 limit 10p
Sent 180664 bytes 1985 pkt (dropped 0, overlimits 0 requeues 0)
rate 0bit 0pps backlog 0b 0p requeues 0
qdisc pfifo 1200: parent 1:200 limit 10p
Sent 5607402 bytes 100899 pkt (dropped 4813, overlimits 0 requeues 0)
rate 0bit 0pps backlog 0b 0p requeues 0
qdisc sfq 1300: parent 1:300 limit 32p quantum 1514b perturb 10sec
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
rate 0bit 0pps backlog 0b 0p requeues 0
qdisc sfq 1400: parent 1:400 limit 32p quantum 1514b perturb 10sec
Sent 9790497 bytes 15682 pkt (dropped 0, overlimits 0 requeues 0)
rate 0bit 0pps backlog 0b 0p requeues 0
qdisc sfq 1999: parent 1:999 limit 32p quantum 1514b perturb 10sec
Sent 1198387 bytes 6755 pkt (dropped 0, overlimits 0 requeues 0)
rate 0bit 0pps backlog 0b 0p requeues 0 
The output below is just two types of output tc can generate. You might find the class statistics to be helpful to diagnose leaves consumption:
# tc -s class show dev eth0

[...truncated...]

class htb 1:400 parent 1:1 leaf 1400: prio 4 rate 800000bit ceil 1600Kbit burst 30Kb cburst 1600b
Sent 10290035 bytes 16426 pkt (dropped 0, overlimits 0 requeues 0)
rate 23624bit 5pps backlog 0b 0p requeues 0
lended: 16424 borrowed: 2 giants: 0
tokens: 4791250 ctokens: 120625
Above is shown the detailled statistics for the HTTP leaf, and you can see the accumulated rate, statistics of packets per seconds, but also the tokens accumulated, lended, borrowed, etc… this is the most helpful output to diagnose your policy in depth.
We mentionned that too large buffers can have a negative impact on the performances of a connection. But how bad is it exactly ?
The answer to that question was investigated by Jim Gettys when he found his home network to be inexplicably slow.
He found that, while we were increasing the bandwidth of network connections, we didn't worry about the latency at all. Those two factors are quite different and both critical to the good quality of a network. Allow me to quote Gettys's FAQ here:
A 100 Gigabit network is always faster than a
1 megabit network, isn’t it?
More bandwidth is always better! I want a faster network!

No, such a network can easily be much slower.
Bandwidth is a measure of capacity, not a measure of how
fast the network can respond. You pick up the phone to
send a message to Shanghai immediately, but dispatching a
cargo ship full of blu-ray disks will be amazingly slower
than the telephone call, even though the bandwidth of the
ship is billions and billions of times larger than the
telephone line.   So more bandwidth is better only if it’s
latency (speed) meets your needs. More of what you don’t
need is useless.
Bufferbloat destroys the speed we really need.
More information on Gettys's page, and in this paper from 1996: It's the Latency, Stupid.
Long story short: if you have bad latency, but large bandwidth, you will be able to transfer very large files efficiently, but a simple DNS query will take a lot longer than it should. And since those DNS queries, and other small message such as VoIP, are very often time sensitive, bad latency impacts them a lot (a single packet takes several hundreds of milliseconds to traverse the network).
So how does that relate to buffers ? Gettys proposes a simple experiment to illustrate the problem.
We said earlier that Linux ships with a default txqueuelen of 1000. This value was increased in the kernel when gigabits ethernet cards became the standard. But not all links are gigabits, far from that. Consider the following two computers:

We will call them desktop and laptop. They are both gigabit, and the switch is gigabit.
If we verify the configuration of their network interfaces, we will confirm that:
the interfaces are configured in gigabits via ethtool  the txqueuelen is set to 1000
# ifconfig eth0|grep txqueuelen
collisions:0 txqueuelen:1000

# ethtool eth0|grep Speed
Speed: 1000Mb/s
On one machine, launch nttcp with the '-i' switch to make it wait for connections:
# nttcp -i
On the laptop, launch “nttcp -t -D -n2048000 <serverIP>”, where
-t means this machine is “transmitting”, or sending data  -D disables the TCP_NODELAY see redhat doc  -n is the number of buffers of 4096 bytes given to the socket.
# nttcp -t -D -n2048000 192.168.1.220
And at the same time, on the laptop, launch a ping of the desktop.
64 bytes from 192.168.1.220: icmp_req=1 ttl=64 time=0.300 ms
64 bytes from 192.168.1.220: icmp_req=2 ttl=64 time=0.386 ms
64 bytes from 192.168.1.220: icmp_req=3 ttl=64 time=19.2 ms
64 bytes from 192.168.1.220: icmp_req=4 ttl=64 time=19.2 ms
64 bytes from 192.168.1.220: icmp_req=5 ttl=64 time=19.2 ms
64 bytes from 192.168.1.220: icmp_req=6 ttl=64 time=19.2 ms
64 bytes from 192.168.1.220: icmp_req=7 ttl=64 time=19.3 ms
64 bytes from 192.168.1.220: icmp_req=8 ttl=64 time=19.0 ms
64 bytes from 192.168.1.220: icmp_req=9 ttl=64 time=0.281 ms
64 bytes from 192.168.1.220: icmp_req=10 ttl=64 time=0.362 ms
The first two pings are launch before nttcp is launched. When nttcp starts, the latency augments but this is still acceptable.
Now, reduce the speed of each network card on the desktop and the laptop to 100Mbips. The command is:
#ethtool -s eth0 speed 100 duplex full

# ethtool eth0|grep Speed
Speed: 100Mb/s
And run the same test again. After 60 seconds, here are the latency I get :
64 bytes from 192.168.1.220: icmp_req=75 ttl=64 time=183 ms
64 bytes from 192.168.1.220: icmp_req=76 ttl=64 time=179 ms
64 bytes from 192.168.1.220: icmp_req=77 ttl=64 time=181 ms
And one last time, with an Ethernet speed of 10Mbps:
64 bytes from 192.168.1.220: icmp_req=187 ttl=64 time=940 ms
64 bytes from 192.168.1.220: icmp_req=188 ttl=64 time=937 ms
64 bytes from 192.168.1.220: icmp_req=189 ttl=64 time=934 ms
Almost one second of latency between two machines next to each other. Every time we divide the speed of the interface by an order of magnitude, we augment the latency by an order of magnitude as well. Under that load, opening an SSH connection from the laptop to the desktop is taking more than 10 seconds, because we have a latency of almost 1 second per packet.
Now, while this last test is running, and while you are enjoying the ridiculous latency of your SSH session to the desktop, we will get rid of the transmit and ethernet buffers.
# ifconfig eth0 |grep txqueuelen
collisions:0 txqueuelen:1000

#  ethtool -g eth0
Ring parameters for eth0:
[...]
Current hardware settings:
[...]
TX:		511
We start by changing the txqueuelen value on the laptop machine from 1000 to zero. The latency will not change.
# ifconfig eth0 txqueuelen 0
64 bytes from 192.168.1.220: icmp_req=1460 ttl=64 time=970 ms
64 bytes from 192.168.1.220: icmp_req=1461 ttl=64 time=967 ms
Then we reduce the size of the tx ring of the ethernet card. Now that we don't have any buffer anymore, let's see what happens:
# ethtool -G eth0 tx 32
64 bytes from 192.168.1.220: icmp_req=1495 ttl=64 time=937 ms
64 bytes from 192.168.1.220: icmp_req=1499 ttl=64 time=0.865 ms
64 bytes from 192.168.1.220: icmp_req=1500 ttl=64 time=60.3 ms
64 bytes from 192.168.1.220: icmp_req=1501 ttl=64 time=53.1 ms
64 bytes from 192.168.1.220: icmp_req=1502 ttl=64 time=49.2 ms
64 bytes from 192.168.1.220: icmp_req=1503 ttl=64 time=45.7 ms
The latency just got divided by 20 ! We dropped from almost one second to barely 50ms. This is the effect of excessive buffering in a network, and this is what happens, today, in most Internet routers.
6.1 What happens in the buffer ?
If we take a look at the Linux networking stack, we see that the TCP stack is a lot above the transmit queue and ethernet buffer. During a normal TCP connection, the TCP stack starts sending and receiving packets at a normal rate, and accelerates its sending speed at an exponential rate: send 2 packets, receive ACKs, send 4 packets, receive ACKs, send 8 packets, receives ACKs, send 16 packets, receives ACKS, etc….
This is known as the TCP Slow Start. This mechanisms works fine in practice, but the presence of large buffers will break it.
A buffer of 1MB on a 1Gbits/s link will empty in ~8 milli-seconds. But the same buffer on a 1MBits/s link will take 8 seconds to empty. During those 8 seconds, the TCP stack thinks that all of the packets it sent have been transmitted, and will probably continue to increase its sending speed. The subsequent packets will get dropped, the TCP stack will panick, drop its sending rate, and restart the slow start procedure from 0: 2 packets, get ack, 4 packets, get ack, etc…
But while the TCP stack was filling up the TX buffers, all the other packets that our system wanted to send got either stuck somewhere in the queue, with several hundreds of milliseconds of delay before being transmitted, or purely dropped.
The problem happens on the TX queue of the sending machine, but also on all the the buffers of the intermediary network devices. And this is why Gettys went to war against the home router vendors.

转自：http://wiki.linuxwall.info/doku.php/en:ressources:dossiers:networking:traffic_control
展开全文
• Image Retrieval: Ideas, influences, and trends of the new age ACM literature 文献翻译 此处奉上原文链接:http://infolab.stanford.edu/~wangz/project/imsearch/review/JOUR/datta_TR.pdf 算是图像检索...

Image Retrieval: Ideas, influences, and trends of the new age
ACM literature

文献翻译

此处奉上原文链接:http://infolab.stanford.edu/~wangz/project/imsearch/review/JOUR/datta_TR.pdf
算是图像检索比较新也比较好的一篇综述了,写论文的时候可以用上一些内容

2.       the various gaps introduced：  鸿沟介绍
—Sensory. The sensory gap is the gap between the object in the world and the information in a (computational) description derived from a recording of that scene.
感知鸿沟，现实物体和我们对世界的感知差距
—Semantic. The semantic gap is the lack of coincidence between the information that one can extract from the visual data and the interpretation that the same data has for a user in a given situation.
语义鸿沟，人们从视觉数据中抽取的信息和某个用户在特定情况下对相同数据的描述缺乏一致性。

过去的图像检索 90-00
3.       In Smeulders et al. [2000]    图像搜索分为两个领域：narrow 和 broad
Narrow image domain: 有限的变化性，更易定义的视觉特征  （特定的图片，如医疗图片）

图像搜索的三个宽目录：
(1)     search by association  联合搜索
对于一副图像没有明确的意图，而是通过反复提炼浏览进行搜索

(2)     aimed search 有目的的搜索
搜索特定的图片

(3)     category search分类搜索
搜索一个语义类的单个图片代表
The overall goal therefore remains to bridge the semantic and sensorial gaps using the available visual features of images and relevant domain knowledge to support the varied search categories, ultimately to satiate the user.
因此，总的目标仍然是缩小语义和感官鸿沟，利用现有的相关领域知识的视觉特征的图像，并支持不同的搜索类别，最终满足一般用户。
4.       从图像抽取视觉内容分为两个部分：图像处理和特征重建。
In this context, search has been described as a specification of minimal invariant conditions that model the user intent, geared at reducing the sensory gap due to accidental distortions, clutter, occlusion, etc
在文中，搜索已被描述为一个最小不变情况的模式下，用户意图减少因意外的扭曲、杂波、闭塞所造成的感知鸿沟

5.       Once image features were extracted, the question remained as to how they could be indexed and matched against each other for retrieval.
一旦图像特征被抽取，问题将改变成为他们如何被在不同的检索过程从被索引和匹配。

6.       现代图像检索 00-08

一．用户意图
(1)     browser浏览者：用户没有最终目标的浏览图片，浏览会话将由一系列的不相关搜索组成。在搜索会话过程中浏览者将跨越多个不同的主题。
(2)     surfer冲浪者: 冲浪者是指一个用户拥有适度清晰的最终目标。一个冲浪者可能在开始阶段带有探索行为，可以为了接下来的所搜者能够更好的知道他从系统中所需要得到的信息。
(3)     searcher 搜索者：这种用户非常清楚在系统中需要什么信息。搜索会话通常会比较短，使用相关搜索达到所需情况。
One of the few studies categorizes users as experts and novices and studies their interaction patterns with respect to a video library   一些研究将专家和新手进行归类 一些研究对于视频库的影响模式
CHRISTEL, M. G. AND CONESCU, R. M. 2005. Addressing the challenge of visual information access from digital image and video libraries. In Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL). （列举从数码图片和视频库中获得视觉信息的挑战）

总结： 对于最终用户的所有事项便是系统和用户的相互影响和相应的反应。所以后面建立了以人为本的多媒体系统。为了获得广泛的接受，图像检索系统需要获得一个以人为中心的视野。
如下文所示：
JAIMES, A., SEBE, N., AND GATICA-PEREZ, D. 2006. Human-centered computing: A multimedia perspective. In Proceedings of the ACM International Conference on Multimedia (Special Session on Human-Centered Multimedia). （以人为本的计算，多媒体观点）

二．数据范围
制度因素，如搜索的多样性，用户群和用户流量的预期也很大程度上影响了设计。沿着这方面，我们将搜索数据划分为以下类别：
(1)  Personal collection 个人收藏
(2)  Domain-Specific Collection 域的特定集合
(3)  Enterprise Collection 企业集
(4)  Archives  档案
(5)  Web  互联网
(6)
Google’s Picasa system [Picasa 2004] provides a chronological display of images taking a user on a journey down memory lane
比较重要的几个应用：
implementation of a color-histogram-based image retrieval system 应用颜色直方图的FPGA检索系统：
KOTOULAS, L. AND ANDREADIS, I. 2003. Colour histogram content-based image retrieval and hardware implementation. IEEE Proc. Circ. Dev. Syst. 150, 5, 387–393.     （基于内容的图像直方图检索和硬件实现）
FPGA implementation for subimage retrieval within an image database 使用在数据库中的FPGA自图像检索
NAKANO, K. AND TAKAMICHI, E. 2003. An image retrieval system using FPGAs. In Proceedings of the Asia South Pacific Design Automation Conference.     （一种使用FPGA的图像检索方法）
a method for efficient retrieval in a network of imaging devices  对于网络图像设备的快速检索方法
WOODROW, E. AND HEINZELMAN, W. 2002. Spin-It: A data centric routing protocol for image retrieval in wireless networks. In Proceedings of the IEEE International Conference on Image Processing (ICIP).  （在无线网络中进行计算图像检索的数据为中心的路由协议）
Discussion. Regardless of the nature of the collection, as the expected user-base grows, factors such as concurrent query support, efficient caching, and parallel and distributed processing of requests become critical. For future real-world image retrieval systems, both software and hardware approaches to address these issues are essential.
More realistically, dedicated specialized servers, optimized memory and storage support, and highly parallelizable image search algorithms to exploit cluster computing powers are where the future of large-scale image search hardware support lies.
总结： 不管用户基数随着集合的性质如何增长，各种因素诸如并行查询的支持、高效缓存、并行、分布式处理等要求变得重要。对于未来的现实世界的图像检索系统，同时使用软件和硬件的方法来解决这些问题是至关重要的。更为现实的，专注致力于服务器，优化的内存和存储支持，高并行性图像检索算法，利用云计算的力量是未来大规模图像检索硬件的方向。

三．查询方式和处理
In the realm of image retrieval, an important parameter to measure user-system interaction level is the complexity of queries supported by the system.
在图像检索界，衡量用户系统的交互等级的重要参数就是系统的查询复杂性
We describe next the various querying modalities, their characteristics, and the system support required thereof
我们描述了未来的各种查询方式，他们的特点，系统支持所需的单位。
(1)    Keywords 关键字
(2)    Free-Text 文字
(3)    Image 图像   搜索和所需图像相似的图像
(4)    Graphics 图形
(5)    Composite 组合
从系统查询的角度进行描述：
(1)    —Text-Based 基于文本的
(2)    —Content-Based 基于内容的
(3)    —Composite 混合的
(4)    Interactive-Simple 单一交互式
(5)    Interactive-Composite 混合交互式

Processing text-based queries involves keyword matching using simple set-theoretic operations, and therefore a response can be generated very quickly.
处理基于文本的查询涉及到使用简单的关键字匹配的集合论操作，因此可以生成一个响应速度非常快。
文本检索方法：
R-trees are used for indexing images represented as attributed relational graphs (ARGs)
R-trees 作为引用关系图来对图像表示进行检索。
PETRAKIS, E. G. M., DIPLAROS, A., AND MILIOS, E. 2002. Matching and retrieval of distorted and occluded shapes using dynamic programming. IEEE Trans. Pattern Anal. Mach. Intell. 24, 4, 509–522.
PETRAKIS, E. G. M., FALOUTSOS, C., AND LIN, K. I. 2002. Imagemap: An image indexing method based on spatial similarity. IEEE Trans. Knowl. Data Eng. 14, 5, 979–987

Retrieval of images using wavelet coefficients as image representations and R∗-trees for indexing has been studied in Natsev et al. [2004]    使用小波系数作为图像表示进行检索，使用R树进行索引
NATSEV, A., RASTOGI, R., AND SHIM, K. 2004. Walrus: A similarity retrieval algorithm for image databases. IEEE Trans. Knowl. Data Eng. 16, 3, 301–316.    （图像数据库的相似检索算法）

Visual content matching using graph-based image representation and an efficient metric indexing algorithm has been proposed in Berretti et al. [2001]   使用基于图的图像表示方法进行视觉内容匹配并使用一个高效的度量索引算法进行索引。
BERRETTI, S., BIMBO, A. D., AND VICARIO, E. 2001. Efficient matching and indexing of graph models in content-based retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 23, 10, 1089–1105.   （基于内容的图像检索中的图模型匹配和索引方法）

Composite querying methods provide the users with more flexibility for expressing themselves.
综合查询方法提供了表达自己的用户提供更大的灵活性。
Some recent innovations in querying include sketch-based retrieval of color images [Chalechale et al. 2005].   一些最近发明的查询算法包括 彩色图像基于概括的图像检索。
CHALECHALE, A., NAGHDY, G., AND MERTINS, A. 2005. Sketch-based image matching using angular partitioning. IEEE Trans. Syst. Man Cybernet. 35, 1, 28–41.      （使用角分区的基于概述的图像匹配算法）
Querying using 3D models [Assfalg et al. 2002] has been motivated by the fact that 2D image queries are unable to capture the spatial arrangement of objects within the image. 3D模型检索的出现是由于2D图像检索不能够抓住图像空间布局的特点。
ASSFALG, J., DEL BIMBO, A., AND PALA, P. 2002. Three-Dimensional interfaces for querying by example in content-based image retrieval. IEEE Trans. Visualiz. Comput. Graphics 8, 4, 305–318.   基于内容的图像检索的三维接口
In another interesting work, a multimodal system involving hand gestures and speech for querying and relevance feedback was presented in Kaster et al. [2003].
在另一感兴趣的方面，多模型系统引入了手势和语言来检索并且提出了相关反馈机制
KASTER, T., PFEIFFER, M., AND BAUCKHAGE, C. 2003. Combining speech and haptics for intuitive and efficient navigation through image databases. In Proceedings of the 5th International Conference on Multimidia Interfaces (ICMI). 通过图像数据库对手势和触觉进行直观有效的导航。
Certain new interaction-based querying paradigms which statistically model the user’s interest [Fang et al. 2005],  新的基于互动的查询示例，用来统计用户感兴趣的模型
FANG, Y. AND GEMAN, D. 2005. Experiments in mental face retrieval. In Proceedings of the International Conference on Audio- and Video-Based Biometric Person Authentication (AVBPA).
FANG, Y., GEMAN, D., AND BOUJEMAA, N. 2005. An interactive system for mental face retrieval. In Proceedings of the ACM SIGMM International Workshop on Multimedia Information Retrieval (MIR) at the International Multimedia Conference.
help the user refine her queries by providing cues and hints [Jaimes et al. 2004; Nagamine et al. 2004]
通过提供线索和提示帮助用户定义搜索。
JAIMES, A., OMURA, K., NAGAMINE, T., AND HIRATA, K. 2004. Memory cues for meeting video retrieval. In Proceedings of the 1st ACM Workshop on Continuous Archival and Retrieval of Personal Experiences (CARPE) at the ACM International Multimedia Conference.
NAGAMINE, T., JAIMES, A., OMURA, K., AND HIRATA, K. 2004. A visuospatial memory cue system for meeting video retrieval. In Proceedings of the ACM International Conference on Multimedia (demonstration).

总结 ：
Discussion. A prerequisite for supporting text-based query processing is the presence of reliable metadata with pictures. However, pictures rarely come with reliable human tags. In recent years, there has been effort put into building interactive, public-domain games for large-scale collection of high-level manual annotations. One such game (the ESP game) has become very popular and has helped accumulate human annotations for about a hundred thousand pictures [von Ahn and Dabbish 2004]. Collection of manual tags for pictures has the dual advantage of: (1) facilitating text-based querying, and (2) building reliable training datasets for content-based analysis and automatic annotation algorithms. As explored in Datta et al. [2007], it is possible to effectively bridge the paradigms of keyword- and content-based search through a unified framework to provide the user the flexibility of both, without losing out on the search scope.
支持基于文本检索的先决条件是存在可信的图像元数据，然而图像却很少有可用的人为标签。近些年，人们将精力主要放在建立交互式的公共领域游戏，该游戏用来对于高层次手册说明的大规模收集。ESP十分流行并且帮助收集了大约十万图像的用户手册。手机图像的用户标签主要有以下优势：促进基于文本的查询，为基于内容的分析和自动标记算法建立可信的训练数据库，能够有效的连接关键字和基于内容搜索的数据，通过统一框架来提供用户多样的，不丢失的搜索领域。

四．可视化
Presentation of search results is perhaps one of the most important factors in the acceptance and popularity of an image retrieval system   在可接受和流行的图像检索系统中，对于搜索结果的描述应该是最重要的一个因素了。
(1)     —Relevance-Ordered  关联 有序    搜索结果通过一些数字衡量进行排序
(2)     Time-Ordered  时间 排序    图片显示按时间排序
(3)     —Clustered  云
(4)     Hierarchical  分级
(5)     Composite 混合

In order to design interfaces for image retrieval systems, it helps to understand factors like how people manage their digital photographs [Rodden and Wood 2003] or frame their queries for visual art images Cunningham et al. [2004].   为了设计用于图像检索系统的接口，它有助于人们了解类似因素如何管理自己的数码照片
RODDEN, K. AND WOOD, K. 2003. How do people manage their digital photographs? In Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems (CHI).
CUNNINGHAM, S. J., BAINBRIDGE, D., AND MASOODIAN, M. 2004. How people describe their image information needs: A grounded theory analysis of visual arts queries. In Proceedings of the ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL).

In Rodden et al. [2001], user studies on various ways of arranging images for browsing purposes are conducted, and the observation is that both visual-feature-based and concept-based arrangements have their own merits and demerits.   用户为浏览图片研究各种排列图像的方法，这个观察的结果是基于视觉特征和基于概念的排列方法各有他们自己的优点和缺点
RODDEN, K., BASALAJ, W., SINCLAIR, D., AND WOOD, K. 2001. Does organization by similarity assist image browsing? In Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems (CHI).

Thinking beyond the typical grid-based arrangement of top matching images, spiral and concentric visualization of retrieval results have been explored in Torres et al. [2003]. 基于网格的图像匹配，螺旋和检索可视化检索结果已经被提出。
TORRES, R. S., SILVA, C. G., MEDEIROS, C. B., AND ROCHA, H. V. 2003. Visual structures for image browsing. In Proceedings of the ACM International Conference on Information and Knowledge Management (CIKM).

For personal images, innovative arrangements of query results based on visual content, time-stamps, and efficient use of screen space add new dimensions to the browsing experience [Huynh et al. 2005].
HUYNH, D. F., DRUCKER, S. M., BAUDISCH, P., AND WONG, C. 2005. Time quilt: Scaling up zoomable photo browsers for large, unstructured photo collections. In Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems (CHI).

Image transcoding techniques, which aim at adapting multimedia (image and video) content to the capabilities of the client device, have been studied extensively in the last several years [Shanableh and Ghanbari 2000; Vetro et al. 2003; Bertini et al. 2003; Cucchiara et al. 2003].  图像转码技术，旨在适应多媒体和客户设备，在这些年已经大量研究了
SHANABLEH, T. AND GHANBARI, M. 2000. Heterogeneous video transcoding to lower spatio-temporal resolutions and different encoding formats. IEEE Trans. Multimed. 2, 2, 101–110.  （异构视频转码到低时空解决和不同编码格式）
VETRO, A., CHRISTOPOULOS, C., AND SUN, H. 2003. Video transcoding architectures and techniques: An overview. IEEE Signal Process. Mag. 20, 2, 18–29.   视频转码结构和技术
BERTINI, M., CUCCHIARA, R., DEL BIMBO, A., AND PRATI, A. 2003. Object and event detection for semantic annotation and transcoding. In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME). 语言标注和转码的物体时间检测
CUCCHIARA, R., GRANA., C., AND PRATI, A. 2003. Semantic video transcoding using classes of relevance. Int. J. Image Graph. 3, 1, 145–170.    使用相关类的语义视频转码

总结：
Discussion. Study of organizations which maintain image management and retrieval systems has provided useful insights into system design, querying, and visualization. In Tope and Enser [2000], The final verdict of acceptance rejection for any visualization scheme comes from end-users.

研究表米娜那些坚持图像管理和检索系统组织已经对系统设计查询和可视化的有了一定的认识，
虽然简单，如基于网格的显示直观的界面，已成为可以接受的大多数搜索引擎用户，先进的可视化技术还可以在决策。

五．现代图像检索系统
Recently, a public-domain search engine called Riya (see Figure 4) has been developed, which incorporates image retrieval and face recognition for searching pictures of people and products on the Web.
大众主流搜索引擎Riya，包含了图像检索和脸部识别功能。
www.riya.com
It is also interesting to note that CBIR technology is being applied to domains as diverse as family album management, botany, astronomy, mineralogy, and remote sensing [Zhang et al. 2003;
Wang et al. 2002; Csillaghy et al. 2000; Painter et al. 2003; Schroder et al. 2000].
这也是值得注意的基于内容图像检索技术被应用于不同领域的家庭相册管理，植物学，天文学，矿物学，和遥感
ZHANG, L., CHEN, L., LI, M., AND ZHANG, H.-J. 2003. In Proceedings of the ACM International Conference on Multimedia    人脸识别

CSILLAGHY, A., HINTERBERGER, H., AND BENZ, A. 2000. Content based image retrieval in astronomy. Inf. Retriev.3, 3, 229–241.   天气预报

PAINTER, T. H., DOZIER, J., ROBERTS, D. A., DAVIS, R. E., AND GREEN, R. O. 2003. Retrieval of sub-pixel snow-covered area and grain size from imaging spectrometer data. Remote Sens. Env. 85, 1, 64–77.    积雪和粮食方面

SCHRODER, M.,REHRAUER, H., SEIDEL, K., ANDDATCU,M. 2000. Interactive learning and probabilistic retrieval in remote sensing image archives. IEEE Trans. Geosci. Remote Sens. 38, 5, 2288–2298.

A publicly available similarity search tool [Wang et al. 2001] is being used for an online database of over 800, 000 airline-related images [Airliners.Net 2005; Slashdot 2005]
http://airliners.net

the integration of similarity search functionality to a large collection of art and cultural images [GlobalMemoryNet 2006], and the incorporation of image similarity to a massive picture archive [Terragalleria 2001] of the renowned travel photographer Q.-T. Luong.
Automatic Linguistic Indexing of Pictures—Real-Time (ALIPR), an automatic image
annotation system [Li andWang 2006a; 2008], has been recently made public for people
to try to have their pictures annotated. As mentioned earlier, presence of reliable tags
with pictures is necessary for text-based image retrieval. As part of the ALIPR search
engine, an effort to automatically validate computer generated tags with human-given
annotation is being used in an attempt to build a very large collection of searchable
images (see Figure 5). Another work-in-progress is a Web image search system [Joshi
et al. 2006a] that exploits visual features and textual metadata, using state-of-the-art
algorithms, for a comprehensive search experience.

Discussion. Image analysis and retrieval systems have received widespread public and media interest of late [Mirsky 2006; Staedter 2006; CNN 2005]. It is reasonable to hope that in the near future, the technology will diversify to many other domains. We believe that the future of real-world image retrieval lies in exploiting both text- and content-based search technologies. While the former is considered more reliable from a user viewpoint, . This endeavor will hopefully be actualized in the years to come.
图像分析和检索系统已经得到了广泛的运用，有理由相信，在未来这项技术能够发展到很多其他的领域，我们相信在未来现实的图像检索可以使用基于文本的和基于内容的技术。然而基于文本的检索更依赖于用户的观点，在合并二者建立的自动图像搜索引擎有着巨大的潜力，他将成为网络图像检索的隐形的部分。这一努力希望在未来能够成真。

7.       图像检索技术 真正的核心问题

By the nature of its task, the CBIR technology boils down to two intrinsic problems:(a) how to mathematically describe an image, and (b) how to assess the similarity between a pair of images based on their abstracted descriptions.
由于工作性质，CBIR技术可归结为两个内在的问题：如何以数学方式描述图像，如何评估两个图像在抽象描述上的相似性
The first issue arises because the original representation of an image which is an array of pixel values, corresponds poorly to our visual response, let alone semantic understanding of the image.
第一个问题的产生是因为原来的像素值的一个形象代表的是一个数组，对应到我们的视觉反应很差，更不用说图像语义理解。

We refer to the mathematical description of an image, for retrieval purposes, as its signature. From the design perspective, the extraction of signatures and the calculation of image similarity cannot be cleanly separated. The formulation of signatures determines to a large extent the realm for definitions of similarity measures. On the other hand, intuitions are often the early motivating factors for designing similarity measures in a certain way, which in turn puts requirements on the construction of signatures.
基于图像检索的目的用数学方式描述图像作为他的签名。从设计的角度来看，签名的抽取和图像相似度的估量还不能完全的分开。签名的表示决定于相似性定义。另一方面，对于涉及相似性手段，直觉通常是一个非常积极的因素。他又给签名建设提供了需求。
In comparison with earlier, pre-2000 work in CBIR, a remarkable difference of recent years has been the increased diversity of image signatures. Advances have been made in both the derivation of new features (e.g., shape) and the construction of signatures based on these features, with the latter type of progress being more pronounced. The richness in the mathematical formulation of signatures grows alongside the invention of new methods for measuring similarity.
在引出新特征和基于这些特征的图像结构上做出了不少的进步，而且后一类更是作出了明显的进展。图像签名公式伴随这新的相似性度量一起增长。
In terms of methodology development, a strong trend which has emerged in recent years is the employment of statistical and machine learning techniques in various aspects of the CBIR technology.
按照方法学的发展，在最近几年一个很大的趋势就是统计学和机器学习技术最近几年多用在CBIR中。
一．抽取视觉
Most CBIR systems perform feature extraction as a preprocessing step.  大多数基于内容的图像检索系统将特征抽取作为图像检索的第一处理步骤

The current decade has seen great interest in region-based visual signatures, for which segmentation is the quintessential first step. While we begin the discussion with recent progress in image segmentation, we will see in the subsequent section that there is significant interest in segmentation free techniques to feature extraction and signature construction.

图像风格   Image Segmentation
To acquire a region-based signature, a key step is to segment images. Reliable segmentation is especially critical for characterizing shapes within images, without which the shape estimates are largely meaningless. We described earlier a widely used segmentation approach based on k-means clustering. This basic approach enjoys a speed advantage, but is not as refined as some recently developed methods.
为了获得基于区域的签名，主要的步骤便是分割图像。对于形状有意义的图像来说，图像的形状特征而言可靠的分割是决定性的。 我们最先描述的是一个基于k-means族的广泛使用的算法，这个算法拥有速度优势，但不像其他方法一样精准。
One of the most important new advances in segmentation employs the normalized cuts criterion [Shi and Malik 2000].  在图像切割上一个新的最重要的发展就是使用了标准的切割标准
SHI, J. AND MALIK, J. 2000. Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22, 8, 888–905

The problem of image segmentation is mapped to a weighted graph partitioning problem, where the vertex set of the graph is composed of image pixels and the edge weights represent some perceptual similarity between pixel pairs.
图像的分割问题映射到一个加权图的划分问题，那里的顶点图设置权重的是图像的像素点组成的代表和一些边缘的像素对知觉相似性。
The normalized cut segmentation method in Shi and Malik [2000] is also extended to textured image segmentation by using cues of contour and texture differences [Malik et al. 2001], and to incorporate known partial grouping priors by solving a constrained optimization problem [and Shi 2004].
上一篇文章同样也被扩展到纹理切割 ， 并通过结合对已知内容的分割来解决约束优先问题

Searching of medical image collections has been an increasingly important research problem of late, due to the high-throughput, high-resolution, and highdimensional imaging modalities introduced.
医学图像检索方面的讲述
In this domain, 3D brain magnetic resonance (MR) images have been segmented using hidden Markov random fields and the expectation-maximization (EM) algorithm [et al. 2001], and the spectral clustering approach has found some success in segmenting vertebral bodies from sagittal MR images [Carballido-Gamio et al. 2004].
3D脑磁共振图像已经使用隐马尔可夫模型的随机场和期望最小值算法，并且这类似的方法在人体脊椎的核磁共振图像上也同样获得了成功
ZHANG, Y., BRADY, M., AND SMITH, S. 2001. Segmentation of brain MR images through a hidden Markov random field model and the expectation-maximization algorithm. IEEE Trans. Medical Imag. 20, 1, 45–57.
CARBALLIDO-GAMIO, J., BELONGIE, S., ANDMAJUMDAR, S. 2004. Normalized cuts in 3-D for spinal MRI segmentation. IEEE Trans. Medical Imag. 23, 1, 36–44.

8.
9.
10.
11.
12.
13.
14.
15.
Image Retrieval: Ideas, influences, and trends of the new age
ACM literature

2.       the various gaps introduced：  鸿沟介绍
—Sensory. The sensory gap is the gap between the object in the world and the information in a (computational) description derived from a recording of that scene.
感知鸿沟，现实物体和我们对世界的感知差距
—Semantic. The semantic gap is the lack of coincidence between the information that one can extract from the visual data and the interpretation that the same data has for a user in a given situation.
语义鸿沟，人们从视觉数据中抽取的信息和某个用户在特定情况下对相同数据的描述缺乏一致性。

过去的图像检索 90-00
3.       In Smeulders et al. [2000]    图像搜索分为两个领域：narrow 和 broad
Narrow image domain: 有限的变化性，更易定义的视觉特征  （特定的图片，如医疗图片）

图像搜索的三个宽目录：
(1)     search by association  联合搜索
对于一副图像没有明确的意图，而是通过反复提炼浏览进行搜索

(2)     aimed search 有目的的搜索
搜索特定的图片

(3)     category search分类搜索
搜索一个语义类的单个图片代表
The overall goal therefore remains to bridge the semantic and sensorial gaps using the available visual features of images and relevant domain knowledge to support the varied search categories, ultimately to satiate the user.
因此，总的目标仍然是缩小语义和感官鸿沟，利用现有的相关领域知识的视觉特征的图像，并支持不同的搜索类别，最终满足一般用户。
4.       从图像抽取视觉内容分为两个部分：图像处理和特征重建。
In this context, search has been described as a specification of minimal invariant conditions that model the user intent, geared at reducing the sensory gap due to accidental distortions, clutter, occlusion, etc
在文中，搜索已被描述为一个最小不变情况的模式下，用户意图减少因意外的扭曲、杂波、闭塞所造成的感知鸿沟

5.       Once image features were extracted, the question remained as to how they could be indexed and matched against each other for retrieval.
一旦图像特征被抽取，问题将改变成为他们如何被在不同的检索过程从被索引和匹配。

6.       现代图像检索 00-08

一．用户意图
(1)     browser浏览者：用户没有最终目标的浏览图片，浏览会话将由一系列的不相关搜索组成。在搜索会话过程中浏览者将跨越多个不同的主题。
(2)     surfer冲浪者: 冲浪者是指一个用户拥有适度清晰的最终目标。一个冲浪者可能在开始阶段带有探索行为，可以为了接下来的所搜者能够更好的知道他从系统中所需要得到的信息。
(3)     searcher 搜索者：这种用户非常清楚在系统中需要什么信息。搜索会话通常会比较短，使用相关搜索达到所需情况。
One of the few studies categorizes users as experts and novices and studies their interaction patterns with respect to a video library   一些研究将专家和新手进行归类 一些研究对于视频库的影响模式
CHRISTEL, M. G. AND CONESCU, R. M. 2005. Addressing the challenge of visual information access from digital image and video libraries. In Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL). （列举从数码图片和视频库中获得视觉信息的挑战）

总结： 对于最终用户的所有事项便是系统和用户的相互影响和相应的反应。所以后面建立了以人为本的多媒体系统。为了获得广泛的接受，图像检索系统需要获得一个以人为中心的视野。
如下文所示：
JAIMES, A., SEBE, N., AND GATICA-PEREZ, D. 2006. Human-centered computing: A multimedia perspective. In Proceedings of the ACM International Conference on Multimedia (Special Session on Human-Centered Multimedia). （以人为本的计算，多媒体观点）

二．数据范围
制度因素，如搜索的多样性，用户群和用户流量的预期也很大程度上影响了设计。沿着这方面，我们将搜索数据划分为以下类别：
(1)  Personal collection 个人收藏
(2)  Domain-Specific Collection 域的特定集合
(3)  Enterprise Collection 企业集
(4)  Archives  档案
(5)  Web  互联网
(6)
Google’s Picasa system [Picasa 2004] provides a chronological display of images taking a user on a journey down memory lane
比较重要的几个应用：
implementation of a color-histogram-based image retrieval system 应用颜色直方图的FPGA检索系统：
KOTOULAS, L. AND ANDREADIS, I. 2003. Colour histogram content-based image retrieval and hardware implementation. IEEE Proc. Circ. Dev. Syst. 150, 5, 387–393.     （基于内容的图像直方图检索和硬件实现）
FPGA implementation for subimage retrieval within an image database 使用在数据库中的FPGA自图像检索
NAKANO, K. AND TAKAMICHI, E. 2003. An image retrieval system using FPGAs. In Proceedings of the Asia South Pacific Design Automation Conference.     （一种使用FPGA的图像检索方法）
a method for efficient retrieval in a network of imaging devices  对于网络图像设备的快速检索方法
WOODROW, E. AND HEINZELMAN, W. 2002. Spin-It: A data centric routing protocol for image retrieval in wireless networks. In Proceedings of the IEEE International Conference on Image Processing (ICIP).  （在无线网络中进行计算图像检索的数据为中心的路由协议）
Discussion. Regardless of the nature of the collection, as the expected user-base grows, factors such as concurrent query support, efficient caching, and parallel and distributed processing of requests become critical. For future real-world image retrieval systems, both software and hardware approaches to address these issues are essential.
More realistically, dedicated specialized servers, optimized memory and storage support, and highly parallelizable image search algorithms to exploit cluster computing powers are where the future of large-scale image search hardware support lies.
总结： 不管用户基数随着集合的性质如何增长，各种因素诸如并行查询的支持、高效缓存、并行、分布式处理等要求变得重要。对于未来的现实世界的图像检索系统，同时使用软件和硬件的方法来解决这些问题是至关重要的。更为现实的，专注致力于服务器，优化的内存和存储支持，高并行性图像检索算法，利用云计算的力量是未来大规模图像检索硬件的方向。

三．查询方式和处理
In the realm of image retrieval, an important parameter to measure user-system interaction level is the complexity of queries supported by the system.
在图像检索界，衡量用户系统的交互等级的重要参数就是系统的查询复杂性
We describe next the various querying modalities, their characteristics, and the system support required thereof
我们描述了未来的各种查询方式，他们的特点，系统支持所需的单位。
(1)    Keywords 关键字
(2)    Free-Text 文字
(3)    Image 图像   搜索和所需图像相似的图像
(4)    Graphics 图形
(5)    Composite 组合
从系统查询的角度进行描述：
(1)    —Text-Based 基于文本的
(2)    —Content-Based 基于内容的
(3)    —Composite 混合的
(4)    Interactive-Simple 单一交互式
(5)    Interactive-Composite 混合交互式

Processing text-based queries involves keyword matching using simple set-theoretic operations, and therefore a response can be generated very quickly.
处理基于文本的查询涉及到使用简单的关键字匹配的集合论操作，因此可以生成一个响应速度非常快。
文本检索方法：
R-trees are used for indexing images represented as attributed relational graphs (ARGs)
R-trees 作为引用关系图来对图像表示进行检索。
PETRAKIS, E. G. M., DIPLAROS, A., AND MILIOS, E. 2002. Matching and retrieval of distorted and occluded shapes using dynamic programming. IEEE Trans. Pattern Anal. Mach. Intell. 24, 4, 509–522.
PETRAKIS, E. G. M., FALOUTSOS, C., AND LIN, K. I. 2002. Imagemap: An image indexing method based on spatial similarity. IEEE Trans. Knowl. Data Eng. 14, 5, 979–987

Retrieval of images using wavelet coefficients as image representations and R∗-trees for indexing has been studied in Natsev et al. [2004]    使用小波系数作为图像表示进行检索，使用R树进行索引
NATSEV, A., RASTOGI, R., AND SHIM, K. 2004. Walrus: A similarity retrieval algorithm for image databases. IEEE Trans. Knowl. Data Eng. 16, 3, 301–316.    （图像数据库的相似检索算法）

Visual content matching using graph-based image representation and an efficient metric indexing algorithm has been proposed in Berretti et al. [2001]   使用基于图的图像表示方法进行视觉内容匹配并使用一个高效的度量索引算法进行索引。
BERRETTI, S., BIMBO, A. D., AND VICARIO, E. 2001. Efficient matching and indexing of graph models in content-based retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 23, 10, 1089–1105.   （基于内容的图像检索中的图模型匹配和索引方法）

Composite querying methods provide the users with more flexibility for expressing themselves.
综合查询方法提供了表达自己的用户提供更大的灵活性。
Some recent innovations in querying include sketch-based retrieval of color images [Chalechale et al. 2005].   一些最近发明的查询算法包括 彩色图像基于概括的图像检索。
CHALECHALE, A., NAGHDY, G., AND MERTINS, A. 2005. Sketch-based image matching using angular partitioning. IEEE Trans. Syst. Man Cybernet. 35, 1, 28–41.      （使用角分区的基于概述的图像匹配算法）
Querying using 3D models [Assfalg et al. 2002] has been motivated by the fact that 2D image queries are unable to capture the spatial arrangement of objects within the image. 3D模型检索的出现是由于2D图像检索不能够抓住图像空间布局的特点。
ASSFALG, J., DEL BIMBO, A., AND PALA, P. 2002. Three-Dimensional interfaces for querying by example in content-based image retrieval. IEEE Trans. Visualiz. Comput. Graphics 8, 4, 305–318.   基于内容的图像检索的三维接口
In another interesting work, a multimodal system involving hand gestures and speech for querying and relevance feedback was presented in Kaster et al. [2003].
在另一感兴趣的方面，多模型系统引入了手势和语言来检索并且提出了相关反馈机制
KASTER, T., PFEIFFER, M., AND BAUCKHAGE, C. 2003. Combining speech and haptics for intuitive and efficient navigation through image databases. In Proceedings of the 5th International Conference on Multimidia Interfaces (ICMI). 通过图像数据库对手势和触觉进行直观有效的导航。
Certain new interaction-based querying paradigms which statistically model the user’s interest [Fang et al. 2005],  新的基于互动的查询示例，用来统计用户感兴趣的模型
FANG, Y. AND GEMAN, D. 2005. Experiments in mental face retrieval. In Proceedings of the International Conference on Audio- and Video-Based Biometric Person Authentication (AVBPA).
FANG, Y., GEMAN, D., AND BOUJEMAA, N. 2005. An interactive system for mental face retrieval. In Proceedings of the ACM SIGMM International Workshop on Multimedia Information Retrieval (MIR) at the International Multimedia Conference.
help the user refine her queries by providing cues and hints [Jaimes et al. 2004; Nagamine et al. 2004]
通过提供线索和提示帮助用户定义搜索。
JAIMES, A., OMURA, K., NAGAMINE, T., AND HIRATA, K. 2004. Memory cues for meeting video retrieval. In Proceedings of the 1st ACM Workshop on Continuous Archival and Retrieval of Personal Experiences (CARPE) at the ACM International Multimedia Conference.
NAGAMINE, T., JAIMES, A., OMURA, K., AND HIRATA, K. 2004. A visuospatial memory cue system for meeting video retrieval. In Proceedings of the ACM International Conference on Multimedia (demonstration).

总结 ：
Discussion. A prerequisite for supporting text-based query processing is the presence of reliable metadata with pictures. However, pictures rarely come with reliable human tags. In recent years, there has been effort put into building interactive, public-domain games for large-scale collection of high-level manual annotations. One such game (the ESP game) has become very popular and has helped accumulate human annotations for about a hundred thousand pictures [von Ahn and Dabbish 2004]. Collection of manual tags for pictures has the dual advantage of: (1) facilitating text-based querying, and (2) building reliable training datasets for content-based analysis and automatic annotation algorithms. As explored in Datta et al. [2007], it is possible to effectively bridge the paradigms of keyword- and content-based search through a unified framework to provide the user the flexibility of both, without losing out on the search scope.
支持基于文本检索的先决条件是存在可信的图像元数据，然而图像却很少有可用的人为标签。近些年，人们将精力主要放在建立交互式的公共领域游戏，该游戏用来对于高层次手册说明的大规模收集。ESP十分流行并且帮助收集了大约十万图像的用户手册。手机图像的用户标签主要有以下优势：促进基于文本的查询，为基于内容的分析和自动标记算法建立可信的训练数据库，能够有效的连接关键字和基于内容搜索的数据，通过统一框架来提供用户多样的，不丢失的搜索领域。

四．可视化
Presentation of search results is perhaps one of the most important factors in the acceptance and popularity of an image retrieval system   在可接受和流行的图像检索系统中，对于搜索结果的描述应该是最重要的一个因素了。
(1)     —Relevance-Ordered  关联 有序    搜索结果通过一些数字衡量进行排序
(2)     Time-Ordered  时间 排序    图片显示按时间排序
(3)     —Clustered  云
(4)     Hierarchical  分级
(5)     Composite 混合

In order to design interfaces for image retrieval systems, it helps to understand factors like how people manage their digital photographs [Rodden and Wood 2003] or frame their queries for visual art images Cunningham et al. [2004].   为了设计用于图像检索系统的接口，它有助于人们了解类似因素如何管理自己的数码照片
RODDEN, K. AND WOOD, K. 2003. How do people manage their digital photographs? In Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems (CHI).
CUNNINGHAM, S. J., BAINBRIDGE, D., AND MASOODIAN, M. 2004. How people describe their image information needs: A grounded theory analysis of visual arts queries. In Proceedings of the ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL).

In Rodden et al. [2001], user studies on various ways of arranging images for browsing purposes are conducted, and the observation is that both visual-feature-based and concept-based arrangements have their own merits and demerits.   用户为浏览图片研究各种排列图像的方法，这个观察的结果是基于视觉特征和基于概念的排列方法各有他们自己的优点和缺点
RODDEN, K., BASALAJ, W., SINCLAIR, D., AND WOOD, K. 2001. Does organization by similarity assist image browsing? In Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems (CHI).

Thinking beyond the typical grid-based arrangement of top matching images, spiral and concentric visualization of retrieval results have been explored in Torres et al. [2003]. 基于网格的图像匹配，螺旋和检索可视化检索结果已经被提出。
TORRES, R. S., SILVA, C. G., MEDEIROS, C. B., AND ROCHA, H. V. 2003. Visual structures for image browsing. In Proceedings of the ACM International Conference on Information and Knowledge Management (CIKM).

For personal images, innovative arrangements of query results based on visual content, time-stamps, and efficient use of screen space add new dimensions to the browsing experience [Huynh et al. 2005].
HUYNH, D. F., DRUCKER, S. M., BAUDISCH, P., AND WONG, C. 2005. Time quilt: Scaling up zoomable photo browsers for large, unstructured photo collections. In Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems (CHI).

Image transcoding techniques, which aim at adapting multimedia (image and video) content to the capabilities of the client device, have been studied extensively in the last several years [Shanableh and Ghanbari 2000; Vetro et al. 2003; Bertini et al. 2003; Cucchiara et al. 2003].  图像转码技术，旨在适应多媒体和客户设备，在这些年已经大量研究了
SHANABLEH, T. AND GHANBARI, M. 2000. Heterogeneous video transcoding to lower spatio-temporal resolutions and different encoding formats. IEEE Trans. Multimed. 2, 2, 101–110.  （异构视频转码到低时空解决和不同编码格式）
VETRO, A., CHRISTOPOULOS, C., AND SUN, H. 2003. Video transcoding architectures and techniques: An overview. IEEE Signal Process. Mag. 20, 2, 18–29.   视频转码结构和技术
BERTINI, M., CUCCHIARA, R., DEL BIMBO, A., AND PRATI, A. 2003. Object and event detection for semantic annotation and transcoding. In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME). 语言标注和转码的物体时间检测
CUCCHIARA, R., GRANA., C., AND PRATI, A. 2003. Semantic video transcoding using classes of relevance. Int. J. Image Graph. 3, 1, 145–170.    使用相关类的语义视频转码

总结：
Discussion. Study of organizations which maintain image management and retrieval systems has provided useful insights into system design, querying, and visualization. In Tope and Enser [2000], The final verdict of acceptance rejection for any visualization scheme comes from end-users.

研究表米娜那些坚持图像管理和检索系统组织已经对系统设计查询和可视化的有了一定的认识，
虽然简单，如基于网格的显示直观的界面，已成为可以接受的大多数搜索引擎用户，先进的可视化技术还可以在决策。

五．现代图像检索系统
Recently, a public-domain search engine called Riya (see Figure 4) has been developed, which incorporates image retrieval and face recognition for searching pictures of people and products on the Web.
大众主流搜索引擎Riya，包含了图像检索和脸部识别功能。
www.riya.com
It is also interesting to note that CBIR technology is being applied to domains as diverse as family album management, botany, astronomy, mineralogy, and remote sensing [Zhang et al. 2003;
Wang et al. 2002; Csillaghy et al. 2000; Painter et al. 2003; Schroder et al. 2000].
这也是值得注意的基于内容图像检索技术被应用于不同领域的家庭相册管理，植物学，天文学，矿物学，和遥感
ZHANG, L., CHEN, L., LI, M., AND ZHANG, H.-J. 2003. In Proceedings of the ACM International Conference on Multimedia    人脸识别

CSILLAGHY, A., HINTERBERGER, H., AND BENZ, A. 2000. Content based image retrieval in astronomy. Inf. Retriev.3, 3, 229–241.   天气预报

PAINTER, T. H., DOZIER, J., ROBERTS, D. A., DAVIS, R. E., AND GREEN, R. O. 2003. Retrieval of sub-pixel snow-covered area and grain size from imaging spectrometer data. Remote Sens. Env. 85, 1, 64–77.    积雪和粮食方面

SCHRODER, M.,REHRAUER, H., SEIDEL, K., ANDDATCU,M. 2000. Interactive learning and probabilistic retrieval in remote sensing image archives. IEEE Trans. Geosci. Remote Sens. 38, 5, 2288–2298.

A publicly available similarity search tool [Wang et al. 2001] is being used for an online database of over 800, 000 airline-related images [Airliners.Net 2005; Slashdot 2005]
http://airliners.net

the integration of similarity search functionality to a large collection of art and cultural images [GlobalMemoryNet 2006], and the incorporation of image similarity to a massive picture archive [Terragalleria 2001] of the renowned travel photographer Q.-T. Luong.
Automatic Linguistic Indexing of Pictures—Real-Time (ALIPR), an automatic image
annotation system [Li andWang 2006a; 2008], has been recently made public for people
to try to have their pictures annotated. As mentioned earlier, presence of reliable tags
with pictures is necessary for text-based image retrieval. As part of the ALIPR search
engine, an effort to automatically validate computer generated tags with human-given
annotation is being used in an attempt to build a very large collection of searchable
images (see Figure 5). Another work-in-progress is a Web image search system [Joshi
et al. 2006a] that exploits visual features and textual metadata, using state-of-the-art
algorithms, for a comprehensive search experience.

Discussion. Image analysis and retrieval systems have received widespread public and media interest of late [Mirsky 2006; Staedter 2006; CNN 2005]. It is reasonable to hope that in the near future, the technology will diversify to many other domains. We believe that the future of real-world image retrieval lies in exploiting both text- and content-based search technologies. While the former is considered more reliable from a user viewpoint, . This endeavor will hopefully be actualized in the years to come.
图像分析和检索系统已经得到了广泛的运用，有理由相信，在未来这项技术能够发展到很多其他的领域，我们相信在未来现实的图像检索可以使用基于文本的和基于内容的技术。然而基于文本的检索更依赖于用户的观点，在合并二者建立的自动图像搜索引擎有着巨大的潜力，他将成为网络图像检索的隐形的部分。这一努力希望在未来能够成真。

7.       图像检索技术 真正的核心问题

By the nature of its task, the CBIR technology boils down to two intrinsic problems:(a) how to mathematically describe an image, and (b) how to assess the similarity between a pair of images based on their abstracted descriptions.
由于工作性质，CBIR技术可归结为两个内在的问题：如何以数学方式描述图像，如何评估两个图像在抽象描述上的相似性
The first issue arises because the original representation of an image which is an array of pixel values, corresponds poorly to our visual response, let alone semantic understanding of the image.
第一个问题的产生是因为原来的像素值的一个形象代表的是一个数组，对应到我们的视觉反应很差，更不用说图像语义理解。

We refer to the mathematical description of an image, for retrieval purposes, as its signature. From the design perspective, the extraction of signatures and the calculation of image similarity cannot be cleanly separated. The formulation of signatures determines to a large extent the realm for definitions of similarity measures. On the other hand, intuitions are often the early motivating factors for designing similarity measures in a certain way, which in turn puts requirements on the construction of signatures.
基于图像检索的目的用数学方式描述图像作为他的签名。从设计的角度来看，签名的抽取和图像相似度的估量还不能完全的分开。签名的表示决定于相似性定义。另一方面，对于涉及相似性手段，直觉通常是一个非常积极的因素。他又给签名建设提供了需求。
In comparison with earlier, pre-2000 work in CBIR, a remarkable difference of recent years has been the increased diversity of image signatures. Advances have been made in both the derivation of new features (e.g., shape) and the construction of signatures based on these features, with the latter type of progress being more pronounced. The richness in the mathematical formulation of signatures grows alongside the invention of new methods for measuring similarity.
在引出新特征和基于这些特征的图像结构上做出了不少的进步，而且后一类更是作出了明显的进展。图像签名公式伴随这新的相似性度量一起增长。
In terms of methodology development, a strong trend which has emerged in recent years is the employment of statistical and machine learning techniques in various aspects of the CBIR technology.
按照方法学的发展，在最近几年一个很大的趋势就是统计学和机器学习技术最近几年多用在CBIR中。
一．抽取视觉
Most CBIR systems perform feature extraction as a preprocessing step.  大多数基于内容的图像检索系统将特征抽取作为图像检索的第一处理步骤

The current decade has seen great interest in region-based visual signatures, for which segmentation is the quintessential first step. While we begin the discussion with recent progress in image segmentation, we will see in the subsequent section that there is significant interest in segmentation free techniques to feature extraction and signature construction.

图像风格   Image Segmentation
To acquire a region-based signature, a key step is to segment images. Reliable segmentation is especially critical for characterizing shapes within images, without which the shape estimates are largely meaningless. We described earlier a widely used segmentation approach based on k-means clustering. This basic approach enjoys a speed advantage, but is not as refined as some recently developed methods.
为了获得基于区域的签名，主要的步骤便是分割图像。对于形状有意义的图像来说，图像的形状特征而言可靠的分割是决定性的。 我们最先描述的是一个基于k-means族的广泛使用的算法，这个算法拥有速度优势，但不像其他方法一样精准。
One of the most important new advances in segmentation employs the normalized cuts criterion [Shi and Malik 2000].  在图像切割上一个新的最重要的发展就是使用了标准的切割标准
SHI, J. AND MALIK, J. 2000. Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22, 8, 888–905

The problem of image segmentation is mapped to a weighted graph partitioning problem, where the vertex set of the graph is composed of image pixels and the edge weights represent some perceptual similarity between pixel pairs.
图像的分割问题映射到一个加权图的划分问题，那里的顶点图设置权重的是图像的像素点组成的代表和一些边缘的像素对知觉相似性。
The normalized cut segmentation method in Shi and Malik [2000] is also extended to textured image segmentation by using cues of contour and texture differences [Malik et al. 2001], and to incorporate known partial grouping priors by solving a constrained optimization problem [and Shi 2004].
上一篇文章同样也被扩展到纹理切割 ， 并通过结合对已知内容的分割来解决约束优先问题

Searching of medical image collections has been an increasingly important research problem of late, due to the high-throughput, high-resolution, and highdimensional imaging modalities introduced.
医学图像检索方面的讲述
In this domain, 3D brain magnetic resonance (MR) images have been segmented using hidden Markov random fields and the expectation-maximization (EM) algorithm [et al. 2001], and the spectral clustering approach has found some success in segmenting vertebral bodies from sagittal MR images [Carballido-Gamio et al. 2004].
3D脑磁共振图像已经使用隐马尔可夫模型的随机场和期望最小值算法，并且这类似的方法在人体脊椎的核磁共振图像上也同样获得了成功
ZHANG, Y., BRADY, M., AND SMITH, S. 2001. Segmentation of brain MR images through a hidden Markov random field model and the expectation-maximization algorithm. IEEE Trans. Medical Imag. 20, 1, 45–57.
CARBALLIDO-GAMIO, J., BELONGIE, S., ANDMAJUMDAR, S. 2004. Normalized cuts in 3-D for spinal MRI segmentation. IEEE Trans. Medical Imag. 23, 1, 36–44.


展开全文
• 出处：http://highscalability.com/blog/2014/2/3/how-google-backs-up-the-internet-along-with-exabytes-of-othe.html ... Google Backs Up The Internet Along With Exabytes Of Other Data Ray
出处：http://highscalability.com/blog/2014/2/3/how-google-backs-up-the-internet-along-with-exabytes-of-othe.html

How Google Backs Up The Internet Along With Exabytes Of Other Data

Raymond Blum leads a team of Site Reliability Engineers charged with keeping Google's data secret and keeping it safe. Of course Google would never say how much data this actually is, but from comments it seems that it is not yet a yottabyte, but is many exabytes in size. GMail alone is approaching low exabytes of data.
Mr. Blum, in the video How Google Backs Up the Internet, explained common backup strategies don’t work for Google for a very googly sounding reason: typically they scale effort with capacity. If backing up twice as much data requires twice as much stuff to do it, where stuff is time, energy, space, etc., it won’t work, it doesn’t scale.  You have to find efficiencies so that capacity can scale faster than the effort needed to support that capacity. A different plan is needed when making the jump from backing up one exabyte to backing up two exabytes. And the talk is largely about how Google makes that happen.
Some major themes of the talk:
No data loss, ever. Even the infamous GMail outage did not lose data, but the story is more complicated than just a lot of tape backup. Data was retrieved from across the stack, which requires engineering at every level, including the human.  Backups are useless. It’s the restore you care about. It’s a restore system not a backup system. Backups are a tax you pay for the luxury of a restore. Shift work to backups and make them as complicated as needed to make restores so simple a cat could do it.  You can’t scale linearly. You can’t have 100 times as much data require 100 times the people or machine resources. Look for force multipliers. Automation is the major way of improving utilization and efficiency.  Redundancy in everything. Google stuff fails all the time. It’s crap. The same way cells in our body die. Google doesn’t dream that things don’t die. It plans for it.  Diversity in everything. If you are worried about site locality put data in multiple sites. If you are worried about user error have levels of isolation from user interaction. If you want protection from a software bug put it on different software. Store stuff on different vendor gear to reduce large vendor bug effects.  Take humans out of the loop. How many copies of an email are kept by GMail? It’s not something a human should care about. Some parameters are configured by GMail and the system take care of it. This is a constant theme. High level policies are set and systems make it so. Only bother a human if something outside the norm occurs.  Prove it. If you don’t try it it doesn’t work. Backups and restores are continually tested to verify they work.
There’s a lot to learn here for any organization, big or small. Mr. Blum’s talk is entertaining, informative, and well worth watching. He does really seem to love the challenge of his job.
Here’s my gloss on this very interesting talk where we learn many secrets from inside the beast:
Data availability must be 100%. No data loss ever.
Statistically if you lose 200K of  a2GB file it sounds good, but the file is probably now useless, think an executable or a tax return.  Availability of data is more important the availability of access. If a system is down it’s not the end of the world. If data is lost, it is.  Google guarantees you are covered for all of the following in every possible combination:
location isolation  isolation from application layer problems  isolation from storage layer problems  isolation from media failure   Consider the dimensions you can move the sliders around on. Put software on the vertical and location on the horizontal. If you want to cover everything you would need a copy of layer of the software in different locations. You can do that with VMs in different locations.   Redundancy is not the same as recoverability.
Making many copies does not help meet the no loss guarantee.  Many copies is effective for certain kinds of outages. If an asteroid hits a datacenter and you have a copy far away you are covered.  If you have a bug in your storage stack, copying to N places doesn’t help because the bug corrupts all copies. See the GMail outage as an example.  There aren’t as many asteroids as there are bugs in code, user errors, or writes of a corrupt buffer.  Redundancy is good for locality of reference.  Copying is good when you want all data references as close as possible to where the data is being used.   The entire system is incredibly robust because there’s so much of it.
Google stuff fails all the time. It’s crap. The same way cells in our body die. We don’t dream that things don’t die. We plan for it. Machines die all the time.  Redundancy is the answer. The result is more reliable on the aggregate than a single high quality machine. A single machine can be destroyed by an asteroid. Machines put in 50 different locations are much harder to destroy.   Massively parallel system have more opportunities for loss.
MapReduce running on 30K machines is great, until you have a bug. You have the same bug waiting to run everywhere at once which magnifies the effect.   Local copies do not protect against site outages.
If you have a flood in your server room RAID doesn’t help.  Google File System (GFS), used throughout Google until about a year ago, takes the concept of RAID up a notch. Using coding techniques to write to multiple datacenters in different cities at once, so you only need N-1 fragments to reconstruct the data. So with three datacenters once can die and you still have the data available.   Availability and integrity are an organization wide characteristic.
Google engineers, BigTable, GFS, Colossus, all know data durability and integrity is job one. Lots of systems in place to check and correct any lapses in data availability and integrity.   You want diversity in everything.
If you are worried about site locality put data in multiple sites.  If you are worried about user error have levels of isolation from user interaction.  If you want protection from a software bug put it on different software. Store stuff on different vendor gear to reduce large vendor bug effects.   Tape to back things up works really really well.
Tape is great because it’s not disk. If they could they would use punch cards.  Imagine if you have a bug in a device driver for SATA disks. Tapes save you from that. It increases your diversity because different media implies different software.  Tape capacity is following Moore’s law, so they are fairly happy with tape as a backup medium, though they are working on alternatives, won’t say what they are.  Tapes are encrypted implying that nefarious forces would have a hard time getting anything useful from them.   Backups are useless. It’s the restore you care about.
Find out if there’s a problem before someone needs the data. When you need a restore you really need it.  Run continuous restores. Constantly select at random 5% of backups and restore them to compare them. Why? To find out if backups work before data is lost. Catches a lot of problems.  Run automatic comparisons. Can’t compare to the original because the originals have changed. So checksum everything and compare the checksums. Get it back to the source media, disk, or flash, or wherever it came from. Make sure the data can do a round trip. This is done all the time.   Alert on changes in rates of failure.
If something is different you probably want to know about it. If everything is running normally don’t tell me.  Expect some failures, but don’t alert on a file that doesn’t restore on the first attempt.  Let’s say the rate of failure on the first attempt is typically N. The rate of failure on the second attempt is Y. If there’s a change in the rates of failure then something has gone wrong.   Everything breaks.
Disk breaks all the time time but you know when it happens because you are monitoring it.  With tape you don’t know it’s is broken until you try to use it. Tapes last a long time, but you want to test the tape before you need it.   RAID4 on tape.
Don’t write to just one tape. They are cartridges. A robot might drop them or some magnetic flux might occur. Don’t take a chance.  When writing to tape tell the writer to hold on to the data until we say it’s OK to change. If you do you’ve broken a contract.  Build up 4 full tapes and then generate a 5th code tape by XORing everything together. You can lose any one of the 5 tapes and recover the data.  Now tell the writer they can change the source data because the data has made it to its final physical location and is now redundant.  Every bit of data Google backs up goes through this process.  Hundreds of tapes a month are lost, but don’t have hundreds cases of data loss per month because of this process.  If one tape is lost it is detected using the continuous restore and the sibling tapes are used to rebuild another tape and all is well. In the rare case where two tapes are corrupted you’ve only lost data if the same two spots on the tapes are damaged, so reconstruction is done at the subtape level.  Don’t have data loss because of these techniques. It’s expensive, but it’s the cost of doing business.   Backups are a tax you pay for the luxury of a restore.
It’s a restore system not a backup system. Restores are a non-maskable interrupt. They trump everything. Get the backup restored.  Make backups as complicated and take as long as they need. Make restores as quick and automatic as possible.  Recovery should be stupid, fast, and simple. Want a cat to be able to trigger a central restore.  Restores could happen when you are well rested or when you are dog tired. So you don’t want the human element to determine the success of restoring a copy of your serving data. You are under stress. So do all the work and thinking when you have all the time in the world, which is when making the backup.  Huge percentage of systems work this way.  Data sources may have to be able to store data for a period, perhaps days, before it can be promised that it is backed up. But once backed up, it can be restored and restored quickly.  No longer make the most efficient use of media on backups in order to make restores faster. Taking two hours to read a tape is bad. Write only half a tape and read them in parallel so you get the data back in half the time.   Scale is a problem.
When you have exabytes of data there are real world constraints. If you have to copy 10 exabytes then it could take 10 weeks to backup every day’s data.  With datacenters around the world you have a few choices. Do you give near infinite backup capacity to every site? Do you cluster all the backup by region? What about bandwidth to ship the data? Don’t you need the bandwidth to serve money making traffic?  Look at relevant costs. There are compromises. Not every site has backup facilities. Must balance available capacity on the network. Where do you get the most bang for the buck. For example, backups must happen at site X because it has the bandwidth.   You can’t scale linearly.
Can’t just say you want more network bandwidth and more tape drives. Drives break, so if you have 10,000 the number of drives you’ll need 10,000 times the number of operators to replace them. Do you have 10,000 times the amount of loading dock to put the tape drives on until a truck picks them up. None of this can be linear.  Though the number of tape libraries has gone up a full order of magnitude, there isn’t 10 times as many people involved. There are some number more, but far from a linear increase.  Example was an early prediction that as the number of telephones grew 30% of the US population would be employed as telephone operators. What they didn’t see coming is automated switching.   Automate everything.
Scheduling is automated. If you have a service you say I have a datastore and I need a copy every N and restores must happen in M. Internally systems make this happen. Backups are scheduled, restore testing is run, and integrity testing is run, etc. When a broken tape is detected it is automatically handled.  You as a human don’t see any of this. You might someday ask how many tapes are breaking on average. Or an alert might go out if the rate of tape breakage changes from 100 tapes per day to 300 tapes per day. But until then don’t tell me if 100 tapes a day broke if that’s within the norm.   Humans should not be involved in steady state operations.
Packing up and shipping drives is still a human activity. Automated interfaces prepare shipping labels, get RMA numbers, check that packages have gone out, get acknowledgement of receipt, and if this breaks down a human as to intervene.  Library software maintenance likewise. If there’s a firmware update a human doesn’t run to every system and perform the upgrade. Download it. Let it get pushed to a canary library. Let it be tested. Let the results be verified as accurate. Then let it be pushed out. A human isn’t involved in normal operations.   Handle machine death automatically.
Machines are dying twice a minute. If a machine is dying during a MapReduce job that uses 30,000 machines don’t tell me about it, just handle it and move on. Find another machine, move the work, and restart.  If there are dependencies then schedule a wait. If you wait too long then let me know. You handle your own scheduling. This is the job for an algorithm, not a human.   Keep efficiency improving with growth.
Improve utilization and efficiency a lot. Can’t have 100 times as much data require 100 times the people or machine resources.   The Great GMail Outage and Restoral of 2011. Story of how Google dropped data and got it back.
At 10:31AM on a Sunday he got a page that said “Holly Crap call xxx-xxxx”.  More on the outage here.  Gmail is approaching low exabytes of data. That’s a lot of tapes.  100% recovery. Availability was not 100%. It wasn’t all there on the first or second day. But at the end of a period it was all there.  A whole series of bugs and mishaps occurred in the layer where replication happens. Yes we had three identical files, but they are all empty. Even with unit tests, system tests, and integration tests, bugs get through.  Restored from tape. Massive job. This is where restoral time is relative to scale. Getting back a gigabyte can be done in milliseconds to seconds. Getting back 200,000 inboxes of several gig each will take a while.  Woke up a couple colleagues in Europe because they are were fresher. An advantage of distributed workforce.  Data was restored from many tapes and verified. It didn’t take weeks or months, it took days. Which they were happy with. Other companies in similar situations have taken a month to realized they couldn’t get the data back. Steps have been taken to make sure the process would be faster next time.  One tape drive takes 2 hours to read. The tapes were located all over the place. Otherwise no single location would have enough power to read all the tapes involved in the restoration process.  With compression and checksums they actually didn’t need to read 200K tapes.  The restoral process has been much improved since then.   Prioritize restores.
Archived data can be restored after more important data like your current inbox and sent email.  Accounts that have not been touched in a month can wait while more active users are restored first.   Backup system is viewed as a huge global organism.
Don’t want GMail backups just in New York for example, because if that datacenter grew or shrank the backups would need to scale appropriately.  Treat backup as one giant world spanning system. When a backup occurs it might be somewhere else entirely.  A restore on a tape has to happen where the tape is located. But until it makes the tape the data could be in New York and the backup is in Oregon because that’s where there was capacity. Location isolation is handled automatically, no client is told where their data is backed up.  Capacity can be moved around. As long as there’s global capacity and the network can support it it doesn’t matter where the tapes are located.   The more data you have the more important it is to keep it.
The larger things are the more important they are as a rule. Google used to be just search. Now it is Gmail, stuff held in drives, docs, etc. It’s both larger now and more important.   Have a good infrastructure.
Really good to have general purpose swiss army knives at your disposal. When MapReduce was written they probably never thought of it being used for backups. But without already having MapReduce the idea of using it for backups couldn’t have occurred.   Scaling is really important and you can’t have any piece of it--Software, Infrastructure, Hardware, Processes--that doesn’t scale.
You can’t say I’m going to deploy more tape drives with having the operations staff. If you are going to hire twice as many people do you have twice as many parking spots? Room in the cafeteria? Restrooms? Everything has to scale up. You’ll hit one single bottleneck and it will stop you.   Prove it.
Don’t take anything for granted. Hope is not a strategy.  If you don’t try it it doesn’t work. A restore must happen to verify a backup. Until you get to the end you haven’t proven anything. This attitude has found a lot of failures.   DRT. Disaster recovery testing.
Every N months a disaster scenario is played out. Simulate the response at every level of the organization.  How will the company survive without whatever was taken away by the disaster? Must learn to adapt.    Finds enormous holes in infrastructure and physical security.  Imagine a datacenter with one road leading to it and there are trucks loaded with fuel for the backup generators on the road. What happens when the road is lost? Better have another road and another supplier for diesel fuel.  Do have supply chain redundancy strategies.   Redundancy in different software stacks at different locations at different points in in time.
Don’t let just data migrate through the stack. Keep the data in different layers of the stack for a particular dwell period. So if you lose this and this you have the data somewhere in an overlap. Time, location, and software.  Consider the GMail outage example. How if replication was corrupted could no data be lost? This was a question from the audience and he didn’t really want to give details. Data is constantly being backed up. Let’s say we have the data as of 9PM. Let’s say the corruption started at 8PM but hadn’t made it to tape yet. The corruption was stopped. Software was rolled back to a working release. At some point in the stack all the data is still there. There’s stuff on tape. There’s stuff being replicated. There’s stuff on the front-ends. There’s stuff in logs. There was overlap from all these sources and it was possible to reconstruct all the data. Policy is to not take data out of one stuck until N hours after it has been put in another stack for just these cases.   Delete problem.
I want to delete this. Not going to rewrite tapes just to delete data. It’s just too expensive at scale.  One approach is to do something smart with encryption keys. He didn’t tell us what Google does. Perhaps the key is lost which effectively deletes the data?   A giant organization can work when you trust your colleagues and shard responsibilities.
Trust that they understand their part.  Make sure organizational and software Interfaces are well defined. Implement verification tests between layers.   Whitelisting and blacklisting.
Ensure data is in a guaranteed location and guaranteed not to be in a certain location, which goes against much of the rest of the philosophy which is location diversity and location independence.  Was not originally a feature of the stack. Had to add in to support government requirements.    Responsibility pushed down as low as possible in the stack. Fill out the right profile and it magically happens.
Related Articles

On Reddit


展开全文
• In The Age of Cryptocurrency, we focused primarily on a single application of Bitcoin’s core technology, on its potential to upend currency and payments. Since the book’s publication, though, we ...
• ## A. Start ofthe Hope

千次阅读 2009-04-23 08:58:00
This hope is the first significant hope in the whole text, yet it is the most important hope we shall talk. This hope and the perdition of it had a very great influence on her. The situation in Wel
• CV：翻译并解读2019《A Survey of the Recent Architectures of Deep Convolutional Neural Networks》第四章 导读：深度卷积神经网络的最新架构综述 原作者 Asifullah Khan1, 2*, Anabia Sohail1, 2, Umme ...
• The approach is to ramp the reader up quickly and keep their interest.A Practical Guide to TPM 2.0: Using the Trusted Platform Module in the New Age of Security explains security concepts, describes ...
• out of the office 离任；下班 表示人在公司上班，但是本人现在不在公司里面，可能出去有事了 out of office 去职 表示人不在公司上班了，可能被炒了，或者退休了等 常用邮件模板 Dear all，Thanks ...
• Three months ago, we decided to tear down the framework we were using for our dashboard, Python’s Django, and rebuild it entirely in server-side JavaScript, using node.js. (If there is ever a time in...
• making your development easier and more productive, and with the knowledge of how to install and set up some of the best tools available, including the following: Papervision3D: to create 3D in ...
• 'Big data', 'data science', and 'machine learning' have become familiar terms in the news, as statistical methods are brought to bear upon the enormous data sets of modern science and commerce....
• How to find Wi-Fi password of all Connected Networks using CMD   ...How To Find Wi-Fi Password Using CMD Of All Connected Networks ...In this day and age of Internet, the Wi-F
• Today’s companies are often faced with the complex decision of whether to use public cloud resources or build and deploy their own IT infrastructures. This decision is especially difficult in an age ...
• Interpreting Katherine Anne Porter’s TheJilting of Granny Weatherall ...From the Angle of Eco-feminism Chapter One Katherine AnnePorter and Eco-feminis 1.1Katherine Anne Porter and TheJilting of Gran
• The Pool of Issue TopicsThis page contains the Issue topics for the analytical writing section of the GRE General Test. When you take the test, you will be presented with two Issue topics from this
• Peering Inside the PE: A Tour of the Win32 Portable Executable File FormatMatt PietrekMarch 1994Matt Pietrek is the author of Windows Internals (Addison-Wesley, 1993). He works at Nu-Mega Technologies
• 原文：Until the twentieth century cigarettes were not an important threat to public health. Men used tobacco mainly in the form of cigars. They chewed tobacco, piped tobacco, and snuffed. Most women ...
• Part 1 of 2. This part deals with the rendering trends in the VFX industry today.... 2 includes a run down of 14 of the most popular renderers for VFX. Many of the issues in this special two part
• 版权声明：本文为博主原创... 这篇文章：DEX: Deep EXpectation of apparent age from a single image （点我下载）是2015年ChaLearn LAP（Look At People）的冠军之作，也是我准备搞基于深度学习年龄估计方面阅读...
• http://www.unice.fr/sg/authors/responses.htm Examples of author responses to reviewer comments, taken mostly from the final version of the ms. However, similar comments/responses are provided
• Chapter 9The Development ofComputer OperatingSystem6429.1 The Operating SystemAn operating system is a program that runs on acomputer to simplify the use of the computer forthe user. The operating s
• Pathology：[pəˈθɑlədʒi] the scientific study of diseases 病理学 ora serrata：锯状缘 ciliary body ：睫状体; ciliary：['sɪlɪərɪ] 睫毛的，纤毛的，毛状的; posterior chamber：后房; chamber：[ˈt...
• We are used to building distributed systems on top of large middleware platforms like those implementing CORBA, the Web Services protocols stack, J2EE, etc. In this article, we take a different approa