精华内容
下载资源
问答
  • ...用Python编写! 例子 您可以从此API检索的数据。 趋势库 ❯ curl -X GET " ... " description " : " Hunt down social media accounts by username across social networks
  • Real-Time Trending Topics

    千次阅读 2014-03-06 11:53:13
    Implementing Real-Time Trending Topics With a Distributed Rolling Count Algorithm in Storm JAN 18TH, 2013 A common pattern in real-time data workflows is performing rolling counts of incoming

    Implementing Real-Time Trending Topics With a Distributed Rolling Count Algorithm in Storm

    JAN 18TH, 2013

    A common pattern in real-time data workflows is performing rolling counts of incoming data points, also known as sliding window analysis. A typical use case for rolling counts is identifying trending topics in a user community – such as on Twitter – where a topic is considered trending when it has been among the top N topics in a given window of time. In this article I will describe how to implement such an algorithm in a distributed and scalable fashion using the Storm real-time data processing platform. The same code can also be used in other areas such as infrastructure and security monitoring.

    About Trending Topics and Sliding Windows

    First, let me explain what I mean by “trending topics” so that we have a common understanding. Here is an explanation taken from Wikipedia:

    Trending topics

    A word, phrase or topic that is tagged at a greater rate than other tags is said to be a trending topic. Trending topics become popular either through a concerted effort by users or because of an event that prompts people to talk about one specific topic. These topics help Twitter and their users to understand what is happening in the world.

    Wikipedia page on Twitter en.wikipedia.org/wiki/…

    In other words, it is a measure of “What’s hot?” in a user community. Typically, you are interested in trending topics for a given time span; for instance, the most popular topics in the past five minutes or the current day. So the question “What’s hot?” is more precisely stated as “What’s hottoday?” or “What’s hot this week?”.

    In this article we assume we have a system that uses the Twitter API to pull the latest tweets from the live Twitter stream. We assume further that we have a mechanism in place that extracts topical information in the form of words from those tweets. For instance, we could opt to use a simple pattern matching algorithm that treats #hashtags in tweets as topics. Here, we would consider a tweet such as

    1
    
    @miguno The #Storm project rocks for real-time distributed #data processing!

    to “mention” the topics

    1
    2
    
    storm
    data

    We design our system so that it considers topic A more popular than topic B (for a given time span) if topic A has been mentioned more often in tweets than topic B. This means we only need to countthe number of occurrences of topics in tweets.

    popularity(A)popularity(B)mentions(A)mentions(B)

    For the context of this article we do not care how the topics are actually derived from user content or user activities as long as the derived topics are represented as textual words. Then, the Storm topology described in this article will be able to identify in real-time the trending topics in this input data using a time-sensitive rolling count algorithm (rolling counts are also known as sliding windows) coupled with a ranking step. The former aspect takes care of filtering user input by time span, the latter of ranking the most trendy topics at the top the list.

    Eventually we want our Storm topology to periodically produce the top N of trending topics similar to the following example output, where t0 to t2 are different points in time:

    1
    2
    3
    4
    5
    6
    7
    
    Rank @ t0   ----->   t1   ----->   t2
    ---------------------------------------------
    1.    java   (33)   ruby   (41)   scala  (32)
    2.    php    (30)   scala  (28)   python (29)
    3.    scala  (21)   java   (27)   ruby   (24)
    4.    ruby   (16)   python (21)   java   (21)
    5.    python (15)   php    (14)   erlang (18)

    In this example we can see that over time “scala” has become the hottest trending topic.

    Sliding Windows

    The last background aspect I want to cover are sliding windows aka rolling counts. A picture is worth a thousand words:

    Figure 1: As the sliding window advances, the slice of its input data changes. In the example above the algorithm uses the current sliding window data to compute the sum of the window’s elements.

    A formula might also be worth a bunch of words – ok, ok, maybe not a full thousand of them – so mathematically speaking we could formalize such a sliding-window sum algorithm as follows:

    m-sized rolling sum=i=ti+melement(i)

    where t continually advances (most often with time) and m is the window size.

    From size to time: If the window is advanced with time, say every N minutes, then the individual elements in the input represent data collected over the same interval of time (here: N minutes). In that case the window size is equivalent to N x m minutes. Simply speaking, if N=1 and m=5, then our sliding window algorithm emits the latest five-minute aggregates every one minute.

    Now that we have introduced trending topics and sliding windows we can finally start talking about writing code for Storm that implements all this in practice – large-scale, distributed, in real time.

    Before We Start

    About storm-starter

    The storm-starter project on GitHub provides example implementations of various Storm real-time data processing topologies such as a simple streaming WordCount algorithm. It also includes a Rolling Top Words topology that can be used for computing trending topics, the purpose of which is exactly what I want to cover in this article.

    When I began to tackle trending topic analysis with Storm I expected that I could re-use most if not all of the Rolling Top Words code in storm-starter. But I soon realized that the old code would need some serious redesigning and refactoring before one could actually use it in a real-world environment – including being able to efficiently maintain and augment the code in a team of engineers across release cycles.

    In the next section I will briefly summarize the state of the Rolling Top Words topology before and after my refactoring to highlight some important changes and things to consider when writing your own Storm code. Then I will continue with covering the most important aspects of the new implementation in further detail. And of course I contributed the new implementation back to the Storm project.

    The Old Code and My Goals for the New Code

    Just to absolutely clear here: I am talking about the defects of the old code to highlight some typical pitfalls during software development for a distributed system such as Storm. My intention is to make other developers aware of these gotchas so that we make less mistakes in our profession. I am by no means implying that the authors of the old code did a bad job (after all, the old code was perfectly adequate to get me started with trending topics in Storm) or that the new implementation I came up with is the pinnacle of coding. :-)

    My initial reaction to the old code was that, frankly speaking, I had no idea what and how it was doing its job. The various logical responsibilities of the code were mixed together in the existing classes, clearly not abiding by the Single Responsibility Principle. And I am not talking about academic treatments of SRP and such – I was hands-down struggling to wrap my head around the old code because of this.

    Also, I noticed a few synchronized statements and threads being launched manually, hinting at additional parallel operations beyond what the Storm framework natively provides you with. Here, I was particularly concerned with those functionalities that interacted with the system time (calls to System.currentTimeMillis()). I couldn’t help the feeling that they looked prone to concurrency issues. And my suspicions were eventually confirmed when I discovered a dirty-write bug in the RollingCountObjects bolt code for the slot-based counting (using long[]) of object occurrences. In practice this dirty-write bug in the old rolling count implementation caused data corruption, i.e. the code was not carrying out its main responsibility correctly – that of counting objects. That said I’d argue that it would not have been trivial to spot this error in the old code prior to refactoring (where it was eventually plain to see), so please don’t think it was just negligence on the part of the original authors. With the new tick tuple feature in Storm 0.8 I was feeling confident that this part of the code could be significantly simplified and fixed.

    In general I figured that completely refactoring the code and untangling these responsibilities would not only make the code more approachable and readable for me and others – after all the storm-starter code’s main purpose is to jumpstart Storm beginners – but it would also allow me to write meaningful unit tests, which would have been very difficult to do with the old code.

    What Before refactoring After refactoring
    Storm Bolts RollingCountObjectsRankObjects,MergeObjects RollingCountBolt,IntermediateRankingsBolt,TotalRankingsBolt,
    Storm Spouts TestWordSpout TestWordSpout (not modified)
    Data Structures - SlotBasedCounter,SlidingWindowCounterRankings,Rankable,RankableObjectWithFields
    Unit Tests - Every class has its own suite of tests.
    Additional Notes Uses manually launched background threads instead of native Storm features to execute periodic activities. Uses new tick tuple feature in Storm 0.8 to trigger periodic activities in Storm components.
    Table 1: The state of the trending topics Storm implementation before and after the refactoring.

    The design and implementation that I will describe in the following sections are the result of a number of refactoring iterations. I started with smaller code changes that served me primarily to understand the existing code better (e.g. more meaningful variable names, splitting long methods into smaller logical units). The more I felt comfortable the more I started to introduce substantial changes. Unfortunately the existing code was not accompanied by any unit tests, so while refactoring I was in the dark, risking to break something that I was not even aware of breaking. I considered writing unit tests for the existing code first and then go back to refactoring but I figured that this would not be the best approach given the state of the code and the time I had available.

    In summary my goals for the new trending topics implementation were:

    1. The new code should be clean and easy to understand, both for the benefit of other developers when adapting or maintaining the code and for reasoning about its correctness. Notably, the code should decouple its data structures from the Storm sub-system and, if possible, favor native Storm features for concurrency instead of custom approaches.
    2. The new code should be covered by meaningful unit tests.
    3. The new code should be good enough to contribute it back to the Storm project to help its community.

    Implementing the Data Structures

    Eventually I settled down to the following core data structures for the new distributed Rolling Count algorithm. As you will see, an interesting characteristic is that these data structures are completely decoupled from any Storm internals. Our Storm bolts will make use of them, of course, but there is no dependency in the opposite direction from the data structures to Storm.

    Another notable improvement is that the new code removes any need and use of concurrency-related code such as synchronized statements or manually started background threads. Also, none of the data structures are interacting with the system time. Eliminating direct calls to system time and manually started background threads makes the new code much simpler and testable than before.

    No more interacting with system time in the low level data structures, yay!
    1
    2
    3
    4
    
    // such code from the old RollingCountObjects bolt is not needed anymore
    long delta = millisPerBucket(_numBuckets)
                   - (System.currentTimeMillis() % millisPerBucket(_numBuckets));
    Utils.sleep(delta);
    

    SlotBasedCounter

    The SlotBasedCounter class provides per-slot counts of the occurrences of objects. The number of slots of a given counter instance is fixed. The class provides four public methods:

    SlotBasedCounter API
    1
    2
    3
    4
    5
    
    public void incrementCount(T obj, int slot);
    public void wipeSlot(int slot):
    public long getCount(T obj, int slot)
    // get the *total* counts of all objects across all slots
    public Map<T, Long> getCounts();
    

    Here is a usage example:

    Using SlotBasedCounter
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    
    // we want to count Object's using five slots
    SlotBasedCounter counter = new SlotBasedCounter<Object>(5);
    
    // counting
    Object trackMe = ...;
    int currentSlot = 0;
    counter.incrementCount(trackMe, currentSlot);
    
    // the counts of an object for a given slot
    long counts = counter.getCount(trackMe, currentSlot);
    
    // the total counts (across all slots) of all objects
    Map<Object, Long> counts = counter.getCounts();
    

    Internally SlotBasedCounter is backed by a Map<T, long[]> for the actual count state. You might be surprised to see the low-level long[] array here – wouldn’t it be better OO style to introduce a new, separate class that is just used for the counting of a single slot, and then we use a couple of these single-slot counters to form the SlotBasedCounter? Well, yes we could. But for performance reasons and for not deviating too far from the old code I decided not to go down this route. Apart from updating the counter – which is a WRITE operation – the most common operation in our use case is a READ operation to get the total counts of tracked objects. Here, we must calculate the sum of an object’s counts across all slots. And for this it is preferable to have the individual data points for an object close to each other (kind of data locality), which the long[] array allows us to do. Your mileage may vary though.

    Figure 2: The SlotBasedCounter class keeps track of multiple counts of a given object. In the example above, the SlotBasedCounter has five logical slots which allows you to track up to five counts per object.

    The SlotBasedCounter is a primitive class that can be used, for instance, as a building block for implementing sliding window counting of objects. And this is exactly what I will describe in the next section.

    SlidingWindowCounter

    The SlidingWindowCounter class provides rolling counts of the occurrences of “things”, i.e. a sliding window count for each tracked object. Its counting functionality is based on the previously described SlotBasedCounter. The size of the sliding window is equivalent to the (fixed) number of slots number of a given SlidingWindowCounter instance. It is used by RollingCountBolt for counting incoming data tuples.

    The class provides two public methods:

    SlidingWindowCounter API
    1
    2
    
    public void incrementCount(T obj);
    Map<T, Long> getCountsThenAdvanceWindow();
    

    What might be surprising to some readers is that this class does not have any notion of time even though “sliding window” normally means a time-based window of some kind. In our case however the window does not advance with time but whenever (and only when) the method getCountsThenAdvanceWindow() is called. This means SlidingWindowCounter behaves just like a normal ring buffer in terms of advancing from one window to the next.

    Note: While working on the code I realized that parts of my redesign decisions – teasing apart the concerns – were close in mind to those of the LMAX Disruptor concurrent ring buffer, albeit much simpler of course. Firstly, to limit concurrent access to the relevant data structures (here: mostly what SlidingWindowCounter is being used for). In my case I followed the SRPand split the concerns into new data structures in a way that actually allowed me to eliminate the need for ANY concurrent access. Secondly, to put a strict sequencing concept in place (the way incrementCount(T obj) and getCountsThenAdvanceWindow() interact) that would prevent dirty reads or dirty writes from happening as was unfortunately possible in the old, system time based code. 

    If you have not heard about LMAX Disruptor before, make sure to read their LMAX technical paper (PDF) on the LMAX homepage for inspirations. It’s worth the time!

    Figure 3: The SlidingWindowCounter class keeps track of multiple rolling counts of objects, i.e. a sliding window count for each tracked object. Please note that the example of an 8-slot sliding window counter above is simplified as it only shows a single count per slot. In reality SlidingWindowCountertracks multiple counts for multiple objects.

    Here is an illustration showing the behavior of SlidingWindowCounter over multiple iterations:

    Figure 4: Example of SlidingWindowCounter behavior for a counter of size 4. Again, the example is simplified as it only shows a single count per slot.

    Rankings and Rankable

    The Rankings class represents fixed-size rankings of objects, for instance to implement “Top 10” rankings. It ranks its objects descendingly according to their natural order, i.e. from largest to smallest. This class is used by AbstractRankerBolt and its derived bolts to track the current rankings of incoming objects over time.

    Note: The Rankings class itself is completely unaware of the bolts’ time-based behavior.

    The class provides five public methods:

    Rankings API
    1
    2
    3
    4
    5
    
    public void updateWith(Rankable r);
    public void updateWith(Rankings other);
    public List<Rankable> getRankings();
    public int maxSize(); // as supplied to constructor
    public int size(); // current size, might be less than maximum size
    

    Whenever you update Rankings with new data, it will discard any elements that are smaller than the updated top N, where N is the maximum size of the Rankings instance (e.g. 10 for a top 10 ranking).

    Now the sorting aspect of the ranking is driven by the natural order of the ranked objects. In my specific case, I created a Rankable interface that in turn implements the Comparable interface. In practice, you simply pass a Rankable object to the Rankings class, and the latter will update its rankings accordingly.

    Using the Rankings class
    1
    2
    3
    4
    5
    
    Rankings topTen = new Rankings(10);
    Rankable C = ...;
    topTen.updateWith(r);
    
    List<Rankable> rankings = topTen.getRankings();
    

    As you can see it is really straight-forward and intuitive in its use.

    Figure 5: The Rankings class ranks Rankable objects descendingly according to their natural order, i.e. from largest to smallest. The example above shows a Rankings instance with a maximum size of 10 and a current size of 8.

    The concrete class implementing Rankable is RankableObjectWithFields. The bolt IntermediateRankingsBolt, for instance, creates Rankables from incoming data tuples via a factory method of this class:

    IntermediateRankingsBolt.java
    1
    2
    3
    4
    5
    
        @Override
        void updateRankingsWithTuple(Tuple tuple) {
            Rankable rankable = RankableObjectWithFields.from(tuple);
            super.getRankings().updateWith(rankable);
        }
    

    Have a look at RankingsRankable and RankableObjectWithFields for details. If you run into a situation where you have to implement classes like these yourself, make sure you follow good engineering practice and add standard methods such as equals() and hashCode() as well to your data structures.

    Implementing the Rolling Top Words Topology

    So where are we? In the sections above we have already discussed a number of Java classes but not even a single one of them has been directly related to Storm. It’s about time that we start writing some Storm code!

    In the following sections I will describe the Storm components that make up the Rolling Top Words topology. When reading the sections keep in mind that the “words” in this topology represent the topics that are currently being mentioned by the users in our imaginary system.

    Overview of the Topology

    The high-level view of the Rolling Top Words topology is shown in the figure below.

    Figure 6: The Rolling Top Words topology consists of instances of TestWordSpoutRollingCountBoltIntermediateRankingsBolt and TotalRankingsBolt. The length of the sliding window (in secs) as well as the various emit frequencies (in secs) are just example values – depending on your use case you would, for instance, prefer to have a sliding window of five minutes and emit the latest rolling counts every minute.

    The main responsibilities are split as follows:

    1. In the first layer the topology runs many TestWordSpout instances in parallel to simulate the load of incoming data – in our case this would be the names of the topics (represented as words) that are currently being mentioned by our users.
    2. The second layer comprises multiple instances of RollingCountBolt, which perform a rolling count of incoming words/topics.
    3. The third layer uses multiple instances of IntermediateRankingsBolt (“I.R. Bolt” in the figure) to distribute the load of pre-aggregating the various incoming rolling counts into intermediate rankings. Hadoop users will see a strong similarity here to the functionality of a combiner in Hadoop.
    4. Lastly, there is the final step in the topology. Here, a single instance of TotalRankingsBoltaggregates the incoming intermediate rankings into a global, consolidated total ranking. The output of this bolt are the currently trending topics in the system. These trending topics can then be used by downstream data consumers to provide all the cool user-facing and backend features you want to have in your platform.

    In code the topology wiring looks as follows in RollingTopWords:

    RollingTopWords.java
    1
    2
    3
    4
    5
    6
    7
    
    builder.setSpout(spoutId, new TestWordSpout(), 2);
    builder.setBolt(counterId, new RollingCountBolt(9, 3), 3)
                .fieldsGrouping(spoutId, new Fields("word"));
    builder.setBolt(intermediateRankerId, new IntermediateRankingsBolt(TOP_N), 2)
                .fieldsGrouping(counterId, new Fields("obj"));
    builder.setBolt(totalRankerId, new TotalRankingsBolt(TOP_N))
                .globalGrouping(intermediateRankerId);
    
    Note: The integer parameters of the setSpout() and setBolt() methods (do not confuse them with the integer parameters of the bolt constructors) configure the parallelism of the Storm components. See my article Understanding the Parallelism of a Storm Topology for details.

    TestWordSpout

    The only spout we will be using is the TestWordSpout that is part of backtype.storm.testingpackage of Storm itself. I will not cover the spout in detail because it is a trivial class. The only thing it does is to select a random word from a fixed list of five words (“nathan”, “mike”, “jackson”, “golda”, “bertels”) and emit that word to the downstream topology every 100ms. For the sake of this article, we consider these words to be our “topics”, of which we want to identify the trending ones.

    Note: Because TestWordSpout selects its output words at random (and each word having the same probability of being selected) in most cases the counts of the various words are pretty close to each other. This is ok for example code such as ours. In a production setting though you most likely want to generate “better” simulation data.

    The spout’s output can be visualized as follows. Note that the @XXXms milliseconds timeline is not part of the actual output.

    1
    2
    3
    4
    5
    6
    7
    8
    
    @100ms: nathan
    @200ms: golda
    @300ms: golda
    @400ms: jackson
    @500ms: mike
    @600ms: nathan
    @700ms: bertels
    ...

    Excursus: Tick Tuples in Storm 0.8+

    A new and very helpful (read: awesome) feature of Storm 0.8 is the so-called tick tuple. Whenever you want a spout or bolt execute a task at periodic intervals – in other words, you want to trigger an event or activity – using a tick tuple is normally the best practice.

    Nathan Marz described tick tuples in the Storm 0.8 announcement as follows:

    Tick tuples: It’s common to require a bolt to “do something” at a fixed interval, like flush writes to a database. Many people have been using variants of a ClockSpout to send these ticks. The problem with a ClockSpout is that you can’t internalize the need for ticks within your bolt, so if you forget to set up your bolt correctly within your topology it won’t work correctly. 0.8.0 introduces a new “tick tuple” config that lets you specify the frequency at which you want to receive tick tuples via the “topology.tick.tuple.freq.secs” component-specific config, and then your bolt will receive a tuple from the __system component and __tick stream at that frequency.

    Nathan Marz on the Storm mailing list groups.google.com/forum/#!msg/…

    Here is how you configure a bolt/spout to receive tick tuples every 10 seconds:

    Configuring a bolt/spout to receive tick tuples every 10 seconds
    1
    2
    3
    4
    5
    6
    7
    
        @Override
        public Map<String, Object> getComponentConfiguration() {
            Config conf = new Config();
            int tickFrequencyInSeconds = 10;
            conf.put(Config.TOPOLOGY_TICK_TUPLE_FREQ_SECS, tickFrequencyInSeconds);
            return conf;
        }
    

    Usually you will want to add a conditional switch to the component’s execute method to tell tick tuples and “normal” tuples apart:

    Telling tick tuples and normal tuples apart
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    
        @Override
        public void execute(Tuple tuple) {
            if (isTickTuple(tuple)) {
                // now you can trigger e.g. a periodic activity
            }
            else {
                // do something with the normal tuple
            }
        }
    
        private static boolean isTickTuple(Tuple tuple) {
            return tuple.getSourceComponent().equals(Constants.SYSTEM_COMPONENT_ID)
                && tuple.getSourceStreamId().equals(Constants.SYSTEM_TICK_STREAM_ID);
        }
    

    Be aware that tick tuples are sent to bolts/spouts just like “regular” tuples, which means they will be queued behind other tuples that a bolt/spout is about to process via its execute() or nextTuple() method, respectively. As such the time interval you configure for tick tuples is, in practice, served on a “best effort” basis. For instance, if a bolt is suffering from high execution latency – e.g. due to being overwhelmed by the incoming rate of regular, non-tick tuples – then you will observe that the periodic activities implemented in the bolt will get triggered later than expected.

    I hope that, like me, you can appreciate the elegance of solely using Storm’s existing primitives to implement the new tick tuple feature. :-)

    RollingCountBolt

    This bolt performs rolling counts of incoming objects, i.e. sliding window based counting. Accordingly it uses the SlidingWindowCounter class described above to achieve this. In contrast to the old implementation only this bolt (more correctly: the instances of this bolt that run as Storm tasks) is interacting with the SlidingWindowCounter data structure. Each instance of the bolt has its own private SlidingWindowCounter field, which eliminates the need for any custom inter-thread communication and synchronization.

    The bolt combines the previously described tick tuples (that trigger at fix intervals in time) with the time-agnostic behavior of SlidingWindowCounter to achieve time-based sliding window counting. Whenever the bolt receives a tick tuple, it will advance the window of its private SlidingWindowCounter instance and emit its latest rolling counts. In the case of normal tuples it will simply count the object and ack the tuple.

    RollingCountBolt
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    
    @Override
    public void execute(Tuple tuple) {
        if (TupleHelpers.isTickTuple(tuple)) {
            LOG.info("Received tick tuple, triggering emit of current window counts");
            emitCurrentWindowCounts();
        }
        else {
            countObjAndAck(tuple);
        }
    }
    
    private void emitCurrentWindowCounts() {
        Map<Object, Long> counts = counter.getCountsThenAdvanceWindow();
        ...
        emit(counts, actualWindowLengthInSeconds);
    }
    
    private void emit(Map<Object, Long> counts) {
        for (Entry<Object, Long> entry : counts.entrySet()) {
            Object obj = entry.getKey();
            Long count = entry.getValue();
            collector.emit(new Values(obj, count));
        }
    }
    
    private void countObjAndAck(Tuple tuple) {
        Object obj = tuple.getValue(0);
        counter.incrementCount(obj);
        collector.ack(tuple);
    }
    

    That’s all there is to it! The new tick tuples in Storm 0.8 and the cleaned code of the bolt and its collaborators also make the code much more testable (the new code of this bolt has 98% test coverage). Compare the code above to the old implementation of the bolt and decide for yourself which one you’d prefer adapting or maintaining:

    RollingCountObjects BEFORE Storm tick tuples and refactoring
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    
    public void prepare(Map stormConf, TopologyContext context, OutputCollector collector) {
        _collector = collector;
        cleaner = new Thread(new Runnable() {
            public void run() {
                Integer lastBucket = currentBucket(_numBuckets);
    
                while(true) {
                  int currBucket = currentBucket(_numBuckets);
                  if(currBucket!=lastBucket) {
                      int bucketToWipe = (currBucket + 1) % _numBuckets;
                      synchronized(_objectCounts) {
                          Set objs = new HashSet(_objectCounts.keySet());
                          for (Object obj: objs) {
                            long[] counts = _objectCounts.get(obj);
                            long currBucketVal = counts[bucketToWipe];
                            counts[bucketToWipe] = 0;
                            long total = totalObjects(obj);
                            if(currBucketVal!=0) {
                                _collector.emit(new Values(obj, total));
                            }
                            if(total==0) {
                                _objectCounts.remove(obj);
                            }
                          }
                      }
                      lastBucket = currBucket;
                  }
                  long delta = millisPerBucket(_numBuckets) - (System.currentTimeMillis() % millisPerBucket(_numBuckets));
                  Utils.sleep(delta);
                }
            }
        });
        cleaner.start();
    }
    
    public void execute(Tuple tuple) {
        Object obj = tuple.getValue(0);
        int bucket = currentBucket(_numBuckets);
        synchronized(_objectCounts) {
            long[] curr = _objectCounts.get(obj);
            if(curr==null) {
                curr = new long[_numBuckets];
                _objectCounts.put(obj, curr);
            }
            curr[bucket]++;
            _collector.emit(new Values(obj, totalObjects(obj)));
            _collector.ack(tuple);
        }
    }
    

    Unit Test Example

    Since I mentioned unit testing a couple of times in the previous section, let me briefly discuss this point in further detail. I implemented the unit tests with TestNGMockito and FEST-Assert. Here is an example unit test for RollingCountBolt, taken from RollingCountBoltTest.

    Example unit test
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    
    @Test
    public void shouldEmitNothingIfNoObjectHasBeenCountedYetAndTickTupleIsReceived() {
        // given
        Tuple tickTuple = MockTupleHelpers.mockTickTuple();
        RollingCountBolt bolt = new RollingCountBolt();
        Map conf = mock(Map.class);
        TopologyContext context = mock(TopologyContext.class);
        OutputCollector collector = mock(OutputCollector.class);
        bolt.prepare(conf, context, collector);
    
        // when
        bolt.execute(tickTuple);
    
        // then
        verifyZeroInteractions(collector);
    }
    

    AbstractRankerBolt

    This abstract bolt provides the basic behavior of bolts that rank objects according to their natural order. It uses the template method design pattern for its execute() method to allow actual bolt implementations to specify how incoming tuples are processed, i.e. how the objects embedded within those tuples are retrieved and counted.

    This bolt has a private Rankings field to rank incoming tuples (those must contain Rankableobjects, of course) according to their natural order.

    AbstractRankerBolt
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    
    /**
     * This method functions as a template method (design pattern).
     */
    @Override
    public final void execute(Tuple tuple, BasicOutputCollector collector) {
        if (TupleHelpers.isTickTuple(tuple)) {
            getLogger().info("Received tick tuple, triggering emit of current rankings");
            emitRankings(collector);
        }
        else {
            updateRankingsWithTuple(tuple);
        }
    
        abstract void updateRankingsWithTuple(Tuple tuple);
    
    }
    

    The two actual implementations used in the Rolling Top Words topology, IntermediateRankingsBolt and TotalRankingsBolt, only need to implement the updateRankingsWithTuple() method.

    IntermediateRankingsBolt

    This bolt extends AbstractRankerBolt and ranks incoming objects by their count in order to produce intermediate rankings. This type of aggregation is similar to the functionality of a combinerin Hadoop. The topology runs many of such intermediate ranking bolts in parallel to distribute the load of processing the incoming rolling counts from the RollingCountBolt instances.

    This bolt only needs to override updateRankingsWithTuple() of AbstractRankerBolt:

    IntermediateRankingsBolt
    1
    2
    3
    4
    5
    
    @Override
    void updateRankingsWithTuple(Tuple tuple) {
        Rankable rankable = RankableObjectWithFields.from(tuple);
        super.getRankings().updateWith(rankable);
    }
    

    TotalRankingsBolt

    This bolt extends AbstractRankerBolt and merges incoming intermediate Rankings emitted by the IntermediateRankingsBolt instances.

    Like IntermediateRankingsBolt, this bolt only needs to override the updateRankingsWithTuple() method:

    TotalRankingsBolt
    1
    2
    3
    4
    5
    
    @Override
    void updateRankingsWithTuple(Tuple tuple) {
        Rankings rankingsToBeMerged = (Rankings) tuple.getValue(0);
        super.getRankings().updateWith(rankingsToBeMerged);
    }
    

    Since this bolt is responsible for creating a global, consolidated ranking of currently trending topics, the topology must run only a single instance of TotalRankingsBolt. In other words, it must be a singleton in the topology.

    The bolt’s current code in storm-starter does not enforce this behavior though – instead it relies on the RollingTopWords class to configure the bolt’s parallelism correctly (if you ask yourself why it doesn’t: that was simply oversight on my part, oops). If you want to improve that, you can provide a so-called per-component Storm configuration for this bolt that sets its maximum task parallelism to 1:

    TotalRankingsBolt
    1
    2
    3
    4
    5
    6
    7
    8
    
    @Override
    public Map<String, Object> getComponentConfiguration() {
        Map<String, Object> conf = new HashMap<String, Object>();
        conf.put(Config.TOPOLOGY_TICK_TUPLE_FREQ_SECS, emitFrequencyInSeconds);
        // run only a single instance of this bolt in the Storm topology
        conf.setMaxTaskParallelism(1);
        return conf;
    }
    

    RollingTopWords

    The class RollingTopWords ties all the previously discussed code pieces together. It implements the actual Storm topology, configures spouts and bolts, wires them together and launches the topology in local mode (Storm’s local mode is similar to a pseudo-distributed, single-node Hadoop cluster).

    By default, it will produce the top 5 rolling words (our trending topics) and run for one minute before terminating. If you want to twiddle with the topology’s configuration settings, here are the most important:

    • Configure the number of generated trending topics by setting the TOP_N constant in RollingTopWords.
    • Configure the length and emit frequencies (both in seconds) for the sliding window counting in the constructor of RollingCountBolt in RollingTopWords#wireTopology().
    • Similarly, configure the emit frequencies (in seconds) of the ranking bolts by using their corresponding constructors.
    • Configure the parallelism of the topology by setting the parallelism_hint parameter of each bolt and spout accordingly.

    Apart from this there is nothing special about this class. And because we have already seen the most important code snippet from this class in the section Overview of the Topology I will not describe it any further here.

    Running the Rolling Top Words topology

    Now that you know how the trending topics Storm code works it is about time we actually launch the topology! The topology is configured to run in local mode, which means you can just grab the code to your development box and launch it right away. You do not need any special Storm cluster installation or similar setup.

    First you must checkout the latest code of the storm-starter project from GitHub:

    1
    
    $ git clone git://github.com/nathanmarz/storm-starter.git
    

    Then you compile and run the Rolling Top Words topology:

    1
    2
    3
    4
    
    # make sure you are in the top-level directory of the storm-starter repo
    $ cd storm-starter
    
    $ mvn -f m2-pom.xml compile exec:java -Dexec.classpathScope=compile -Dexec.mainClass=storm.starter.RollingTopWords
    

    By default the topology will run for one minute and then terminate automatically.

    More information about running and packaging the code for use in a Storm cluster is available in the repo’s README file.

    Example Logging Output

    Here is some example logging output of the topology. The first colum is the current time in milliseconds since the topology was started (i.e. it is 0 at the very beginning). The second colum is the ID of the thread that logged the message. I deliberately removed some entries in the log flow to make the output easier to read. For this reason please take a close look on the timestamps (first column) when you want to compare the various example outputs below.

    Also, the Rolling Top Words topology has debugging output enabled. This means that Storm itself will by default log information such as what data a bolt/spout has emitted. For that reason you will see seemingly duplicate lines in the logs below.

    Lastly, to make the logging output easier to read here is some information about the various thread IDs in this example run:

    Thread ID Java Class
    Thread-37 TestWordSpout
    Thread-39 TestWordSpout
    Thread-19 RollingCountBolt
    Thread-21 RollingCountBolt
    Thread-25 RollingCountBolt
    Thread-31 IntermediateRankingsBolt
    Thread-33 IntermediateRankingsBolt
    Thread-27 TotalRankingsBolt
    Note: The Rolling Top Words code in the storm-starter repository runs more instances of the various spouts and bolts than the code used in this article. I downscaled the settings only to make the figures etc. easier to read. This means your own logging output will look slightly different.

    The topology has just started to run. The spouts generate their first output messages:

    1
    2
    3
    4
    5
    6
    
    2056 [Thread-37] INFO  backtype.storm.daemon.task  - Emitting: wordGenerator default [golda]
    2057 [Thread-19] INFO  backtype.storm.daemon.executor  - Processing received message source: wordGenerator:11, stream: default, id: {}, [golda]
    2063 [Thread-39] INFO  backtype.storm.daemon.task  - Emitting: wordGenerator default [nathan]
    2064 [Thread-25] INFO  backtype.storm.daemon.executor  - Processing received message source: wordGenerator:12, stream: default, id: {}, [nathan]
    2069 [Thread-37] INFO  backtype.storm.daemon.task  - Emitting: wordGenerator default [mike]
    2069 [Thread-21] INFO  backtype.storm.daemon.executor  - Processing received message source: wordGenerator:13, stream: default, id: {}, [mike]

    The three RollingCountBolt instances start to emit their first sliding window counts:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    
    4765 [Thread-19] INFO  backtype.storm.daemon.executor  - Processing received message source: __system:-1, stream: __tick, id: {}, [3]
    4765 [Thread-19] INFO  storm.starter.bolt.RollingCountBolt  - Received tick tuple, triggering emit of current window counts
    4765 [Thread-25] INFO  backtype.storm.daemon.executor  - Processing received message source: __system:-1, stream: __tick, id: {}, [3]
    4765 [Thread-25] INFO  storm.starter.bolt.RollingCountBolt  - Received tick tuple, triggering emit of current window counts
    4766 [Thread-21] INFO  backtype.storm.daemon.executor  - Processing received message source: __system:-1, stream: __tick, id: {}, [3]
    4766 [Thread-21] INFO  storm.starter.bolt.RollingCountBolt  - Received tick tuple, triggering emit of current window counts
    4766 [Thread-19] INFO  backtype.storm.daemon.task  - Emitting: counter default [golda, 24, 2]
    4766 [Thread-25] INFO  backtype.storm.daemon.task  - Emitting: counter default [nathan, 33, 2]
    4766 [Thread-21] INFO  backtype.storm.daemon.task  - Emitting: counter default [mike, 27, 2]

    The two IntermediateRankingsBolt instances emit their intermediate rankings:

    1
    2
    3
    4
    
    5774 [Thread-31] INFO  backtype.storm.daemon.task  - Emitting: intermediateRanker default [[[mike|27|2], [golda|24|2]]]
    5774 [Thread-33] INFO  backtype.storm.daemon.task  - Emitting: intermediateRanker default [[[bertels|31|2], [jackson|19|2]]]
    5774 [Thread-31] INFO  storm.starter.bolt.IntermediateRankingsBolt  - Rankings: [[mike|27|2], [golda|24|2]]
    5774 [Thread-33] INFO  storm.starter.bolt.IntermediateRankingsBolt  - Rankings: [[bertels|31|2], [jackson|19|2]]

    The single TotalRankingsBolt instance emits its global rankings:

    1
    2
    3
    4
    5
    6
    
    3765 [Thread-27] INFO  storm.starter.bolt.TotalRankingsBolt  - Rankings: []
    5767 [Thread-27] INFO  storm.starter.bolt.TotalRankingsBolt  - Rankings: []
    7768 [Thread-27] INFO  storm.starter.bolt.TotalRankingsBolt  - Rankings: [[nathan|33|2], [bertels|31|2], [mike|27|2], [golda|24|2], [jackson|19|2]]
    9770 [Thread-27] INFO  storm.starter.bolt.TotalRankingsBolt  - Rankings: [[bertels|76|5], [nathan|58|5], [mike|49|5], [golda|24|2], [jackson|19|2]]
    11771 [Thread-27] INFO  storm.starter.bolt.TotalRankingsBolt  - Rankings: [[bertels|76|5], [nathan|58|5], [jackson|52|5], [mike|49|5], [golda|49|5]]
    13772 [Thread-27] INFO  storm.starter.bolt.TotalRankingsBolt  - Rankings: [[bertels|110|8], [nathan|85|8], [golda|85|8], [jackson|83|8], [mike|71|8]]
    Note: During the first few seconds after startup you will observe that IntermediateRankingsBolt and TotalRankingsBolt instances will emit empty rankings. This is normal and the expected behavior – during the first seconds the RollingCountBoltinstances will collect incoming words/topics and fill their sliding windows before emitting the first rolling counts to the IntermediateRankingsBolt instances. The same kind of thing happens for the combination of IntermediateBolt instances and the TotalRankingsBoltinstance. This is an important behavior of the code that must be understood by downstream data consumers of the trending topics emitted by the topology.

    What I Did Not Cover

    I introduced a new feature to the Rolling Top Words code that I contributed back to storm-starter. This feature is a metric that tracks the difference between the configured length of the sliding window (in seconds) and the actual window length as seen in the emitted output data.

    1
    
    4763 [Thread-25] WARN  storm.starter.bolt.RollingCountBolt  - Actual window length is 2 seconds when it should be 9 seconds (you can safely ignore this warning during the startup phase)

    This metric provides downstream data consumers with additional meta data, namely the time range that a data tuple actually covers. It is a nifty addition that will make the life of your fellow data scientists easier. Typically, you will see a difference between configured and actual window length a) during startup for the reasons mentioned above and b) when your machines are under high load and therefore not respond perfectly in time. I omitted the discussion of this new feature to prevent this article from getting too long.

    Also, there are some minor changes in my own code that I did not contribute back to storm-starter because I did not want to introduce too many changes at once (such as a refactored TestWordSpout class).

    Summary

    In this article I described how to implement a distributed, real-time trending topics algorithm in Storm. It uses the latest features available in Storm 0.8 (namely tick tuples) and should be a good starting point for anyone trying to implement such an algorithm for their own application. The new code is now available in the official storm-starter repository, so feel free to take a deeper look.

    You might ask whether there is a use of a distributed sliding window analysis beyond the use case I presented in this article. And for sure there is. The sliding window analysis described here applies to a broader range of problems than computing trending topics. Another typical area of application is real-time infrastructure monitoring, for instance to identify broken servers by detecting a surge of errors originating from problematic machines. A similar use case is identifying attacks against your technical infrastructure, notably flood-type DDoS attacks. All of these scenarios can benefit from sliding window analyses of incoming real-time data through tools such as Storm.

    If you think the starter code can be improved further, please contribute your changes back to the storm-starter project on GitHub by forking the official repository, making your changes and sending a pull request.

    Related Links

    展开全文
  • http://www.michael-noll.com/blog/2013/01/18/implementing-real-time-trending-topics-in-storm/   A common pattern in real-time data workflows is performing rolling counts of incoming data points, al....

    http://www.michael-noll.com/blog/2013/01/18/implementing-real-time-trending-topics-in-storm/

     

    A common pattern in real-time data workflows is performing rolling counts of incoming data points, also known as sliding window analysis. A typical use case for rolling counts is identifying trending topics in a user community – such as on Twitter – where a topic is considered trending when it has been among the top N topics in a given window of time. In this article I will describe how to implement such an algorithm in a distributed and scalable fashion using the Storm real-time data processing platform. The same code can also be used in other areas such as infrastructure and security monitoring.

     

    Update 2014-06-04: I updated several references to point to the latest version of storm-starter, which is now part of the official Storm project.

     

    About Trending Topics and Sliding Windows

     

    First, let me explain what I mean by “trending topics” so that we have a common understanding. Here is an explanation taken from Wikipedia:

     

    Trending topics

    A word, phrase or topic that is tagged at a greater rate than other tags is said to be a trending topic. Trending topics become popular either through a concerted effort by users or because of an event that prompts people to talk about one specific topic. These topics help Twitter and their users to understand what is happening in the world.

    Wikipedia page on Twitter en.wikipedia.org/wiki/…

     

    In other words, it is a measure of “What’s hot?” in a user community. Typically, you are interested in trending topics for a given time span; for instance, the most popular topics in the past five minutes or the current day. So the question “What’s hot?” is more precisely stated as “What’s hot today?” or “What’s hot this week?”.

     

    In this article we assume we have a system that uses the Twitter API to pull the latest tweets from the live Twitter stream. We assume further that we have a mechanism in place that extracts topical information in the form of words from those tweets. For instance, we could opt to use a simple pattern matching algorithm that treats #hashtags in tweets as topics. Here, we would consider a tweet such as

     

    1
    
    @miguno The #Storm project rocks for real-time distributed #data processing!

     

    to “mention” the topics

     

    1
    2
    
    storm
    data

     

    We design our system so that it considers topic A more popular than topic B (for a given time span) if topic A has been mentioned more often in tweets than topic B. This means we only need to count the number of occurrences of topics in tweets.

     

    popularity(A)popularity(B)mentions(A)mentions(B)

     

    For the context of this article we do not care how the topics are actually derived from user content or user activities as long as the derived topics are represented as textual words. Then, the Storm topology described in this article will be able to identify in real-time the trending topics in this input data using a time-sensitive rolling count algorithm (rolling counts are also known as sliding windows) coupled with a ranking step. The former aspect takes care of filtering user input by time span, the latter of ranking the most trendy topics at the top the list.

     

    Eventually we want our Storm topology to periodically produce the top N of trending topics similar to the following example output, where t0 to t2 are different points in time:

     

    1
    2
    3
    4
    5
    6
    7
    
    Rank @ t0   ----->   t1   ----->   t2
    ---------------------------------------------
    1.    java   (33)   ruby   (41)   scala  (32)
    2.    php    (30)   scala  (28)   python (29)
    3.    scala  (21)   java   (27)   ruby   (24)
    4.    ruby   (16)   python (21)   java   (21)
    5.    python (15)   php    (14)   erlang (18)

     

    In this example we can see that over time “scala” has become the hottest trending topic.

     

    Sliding Windows

     

    The last background aspect I want to cover are sliding windows aka rolling counts. A picture is worth a thousand words:

     

     

    Figure 1: As the sliding window advances, the slice of its input data changes. In the example above the algorithm uses the current sliding window data to compute the sum of the window’s elements.

     

    A formula might also be worth a bunch of words – ok, ok, maybe not a full thousand of them – so mathematically speaking we could formalize such a sliding-window sum algorithm as follows:

     

    where t continually advances (most often with time) and m is the window size.

     

    From size to time: If the window is advanced with time, say every N minutes, then the individual elements in the input represent data collected over the same interval of time (here: N minutes). In that case the window size is equivalent to N x m minutes. Simply speaking, if N=1 and m=5, then our sliding window algorithm emits the latest five-minute aggregates every one minute.

     

    Now that we have introduced trending topics and sliding windows we can finally start talking about writing code for Storm that implements all this in practice – large-scale, distributed, in real time.

     

    Before We Start

     

    About storm-starter

     

    The storm-starter project on GitHub provides example implementations of various real-time data processing topologies such as a simple streaming WordCount algorithm. It also includes a Rolling Top Words topology that can be used for computing trending topics, the purpose of which is exactly what I want to cover in this article.

     

    When I began to tackle trending topic analysis with Storm I expected that I could re-use most if not all of the Rolling Top Words code in storm-starter. But I soon realized that the old code would need some serious redesigning and refactoring before one could actually use it in a real-world environment – including being able to efficiently maintain and augment the code in a team of engineers across release cycles.

     

    In the next section I will briefly summarize the state of the Rolling Top Words topology before and after my refactoring to highlight some important changes and things to consider when writing your own Storm code. Then I will continue with covering the most important aspects of the new implementation in further detail. And of course I contributed the new implementation back to the Storm project.

     

    The Old Code and My Goals for the New Code

     

    Just to absolutely clear here: I am talking about the defects of the old code to highlight some typical pitfalls during software development for a distributed system such as Storm. My intention is to make other developers aware of these gotchas so that we make less mistakes in our profession. I am by no means implying that the authors of the old code did a bad job (after all, the old code was perfectly adequate to get me started with trending topics in Storm) or that the new implementation I came up with is the pinnacle of coding. :-)

     

    My initial reaction to the old code was that, frankly speaking, I had no idea what and how it was doing its job. The various logical responsibilities of the code were mixed together in the existing classes, clearly not abiding by the Single Responsibility Principle. And I am not talking about academic treatments of SRP and such – I was hands-down struggling to wrap my head around the old code because of this.

    Also, I noticed a few synchronized statements and threads being launched manually, hinting at additional parallel operations beyond what the Storm framework natively provides you with. Here, I was particularly concerned with those functionalities that interacted with the system time (calls to System.currentTimeMillis()). I couldn’t help the feeling that they looked prone to concurrency issues. And my suspicions were eventually confirmed when I discovered a dirty-write bug in the RollingCountObjects bolt code for the slot-based counting (using long[]) of object occurrences. In practice this dirty-write bug in the old rolling count implementation caused data corruption, i.e. the code was not carrying out its main responsibility correctly – that of counting objects. That said I’d argue that it would not have been trivial to spot this error in the old code prior to refactoring (where it was eventually plain to see), so please don’t think it was just negligence on the part of the original authors. With the new tick tuple feature in Storm 0.8 I was feeling confident that this part of the code could be significantly simplified and fixed.

     

    In general I figured that completely refactoring the code and untangling these responsibilities would not only make the code more approachable and readable for me and others – after all the storm-starter code’s main purpose is to jumpstart Storm beginners – but it would also allow me to write meaningful unit tests, which would have been very difficult to do with the old code.



     

    The design and implementation that I will describe in the following sections are the result of a number of refactoring iterations. I started with smaller code changes that served me primarily to understand the existing code better (e.g. more meaningful variable names, splitting long methods into smaller logical units). The more I felt comfortable the more I started to introduce substantial changes. Unfortunately the existing code was not accompanied by any unit tests, so while refactoring I was in the dark, risking to break something that I was not even aware of breaking. I considered writing unit tests for the existing code first and then go back to refactoring but I figured that this would not be the best approach given the state of the code and the time I had available.

    In summary my goals for the new trending topics implementation were:

    1. The new code should be clean and easy to understand, both for the benefit of other developers when adapting or maintaining the code and for reasoning about its correctness. Notably, the code should decouple its data structures from the Storm sub-system and, if possible, favor native Storm features for concurrency instead of custom approaches.
    2. The new code should be covered by meaningful unit tests.
    3. The new code should be good enough to contribute it back to the Storm project to help its community.

    Implementing the Data Structures

    Eventually I settled down to the following core data structures for the new distributed Rolling Count algorithm. As you will see, an interesting characteristic is that these data structures are completely decoupled from any Storm internals. Our Storm bolts will make use of them, of course, but there is no dependency in the opposite direction from the data structures to Storm.

    Another notable improvement is that the new code removes any need and use of concurrency-related code such as synchronized statements or manually started background threads. Also, none of the data structures are interacting with the system time. Eliminating direct calls to system time and manually started background threads makes the new code much simpler and testable than before.

    No more interacting with system time in the low level data structures, yay!
    1
    2
    3
    4
    
    // such code from the old RollingCountObjects bolt is not needed anymore
    long delta = millisPerBucket(_numBuckets)
                   - (System.currentTimeMillis() % millisPerBucket(_numBuckets));
    Utils.sleep(delta);
    

    SlotBasedCounter

    The SlotBasedCounter class provides per-slot counts of the occurrences of objects. The number of slots of a given counter instance is fixed. The class provides four public methods:

    SlotBasedCounter API
    1
    2
    3
    4
    5
    
    public void incrementCount(T obj, int slot);
    public void wipeSlot(int slot):
    public long getCount(T obj, int slot)
    // get the *total* counts of all objects across all slots
    public Map<T, Long> getCounts();
    

    Here is a usage example:

    Using SlotBasedCounter
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    
    // we want to count Object's using five slots
    SlotBasedCounter counter = new SlotBasedCounter<Object>(5);
    // counting
    Object trackMe = ...;
    int currentSlot = 0;
    counter.incrementCount(trackMe, currentSlot);
    // the counts of an object for a given slot
    long counts = counter.getCount(trackMe, currentSlot);
    // the total counts (across all slots) of all objects
    Map<Object, Long> counts = counter.getCounts();
    

    Internally SlotBasedCounter is backed by a Map<T, long[]> for the actual count state. You might be surprised to see the low-level long[] array here – wouldn’t it be better OO style to introduce a new, separate class that is just used for the counting of a single slot, and then we use a couple of these single-slot counters to form the SlotBasedCounter? Well, yes we could. But for performance reasons and for not deviating too far from the old code I decided not to go down this route. Apart from updating the counter – which is a WRITE operation – the most common operation in our use case is a READ operation to get the total counts of tracked objects. Here, we must calculate the sum of an object’s counts across all slots. And for this it is preferable to have the individual data points for an object close to each other (kind of data locality), which the long[] array allows us to do. Your mileage may vary though.

    Figure 2: The SlotBasedCounter class keeps track of multiple counts of a given object. In the example above, the SlotBasedCounter has five logical slots which allows you to track up to five counts per object.

    The SlotBasedCounter is a primitive class that can be used, for instance, as a building block for implementing sliding window counting of objects. And this is exactly what I will describe in the next section.

    SlidingWindowCounter

    The SlidingWindowCounter class provides rolling counts of the occurrences of “things”, i.e. a sliding window count for each tracked object. Its counting functionality is based on the previously described SlotBasedCounter. The size of the sliding window is equivalent to the (fixed) number of slots number of a given SlidingWindowCounter instance. It is used by RollingCountBolt for counting incoming data tuples.

    The class provides two public methods:

    SlidingWindowCounter API
    1
    2
    
    public void incrementCount(T obj);
    Map<T, Long> getCountsThenAdvanceWindow();
    

    What might be surprising to some readers is that this class does not have any notion of time even though “sliding window” normally means a time-based window of some kind. In our case however the window does not advance with time but whenever (and only when) the method getCountsThenAdvanceWindow() is called. This means SlidingWindowCounter behaves just like a normal ring buffer in terms of advancing from one window to the next.

    Note: While working on the code I realized that parts of my redesign decisions – teasing apart the concerns – were close in mind to those of the LMAX Disruptor concurrent ring buffer, albeit much simpler of course. Firstly, to limit concurrent access to the relevant data structures (here: mostly what SlidingWindowCounter is being used for). In my case I followed the SRP and split the concerns into new data structures in a way that actually allowed me to eliminate the need for ANY concurrent access. Secondly, to put a strict sequencing concept in place (the way incrementCount(T obj) and getCountsThenAdvanceWindow() interact) that would prevent dirty reads or dirty writes from happening as was unfortunately possible in the old, system time based code.

    If you have not heard about LMAX Disruptor before, make sure to read their LMAX technical paper (PDF) on the LMAX homepage for inspirations. It’s worth the time!

    Figure 3: The SlidingWindowCounter class keeps track of multiple rolling counts of objects, i.e. a sliding window count for each tracked object. Please note that the example of an 8-slot sliding window counter above is simplified as it only shows a single count per slot. In reality SlidingWindowCounter tracks multiple counts for multiple objects.

    Here is an illustration showing the behavior of SlidingWindowCounter over multiple iterations:

    Figure 4: Example of SlidingWindowCounter behavior for a counter of size 4. Again, the example is simplified as it only shows a single count per slot.

    Rankings and Rankable

    The Rankings class represents fixed-size rankings of objects, for instance to implement “Top 10” rankings. It ranks its objects descendingly according to their natural order, i.e. from largest to smallest. This class is used by AbstractRankerBolt and its derived bolts to track the current rankings of incoming objects over time.

    Note: The Rankings class itself is completely unaware of the bolts’ time-based behavior.

    The class provides five public methods:

    Rankings API
    1
    2
    3
    4
    5
    
    public void updateWith(Rankable r);
    public void updateWith(Rankings other);
    public List<Rankable> getRankings();
    public int maxSize(); // as supplied to constructor
    public int size(); // current size, might be less than maximum size
    

    Whenever you update Rankings with new data, it will discard any elements that are smaller than the updated top N, where N is the maximum size of the Rankings instance (e.g. 10 for a top 10 ranking).

    Now the sorting aspect of the ranking is driven by the natural order of the ranked objects. In my specific case, I created a Rankable interface that in turn implements the Comparable interface. In practice, you simply pass a Rankable object to the Rankings class, and the latter will update its rankings accordingly.

    Using the Rankings class
    1
    2
    3
    4
    5
    
    Rankings topTen = new Rankings(10);
    Rankable C = ...;
    topTen.updateWith(r);
    List<Rankable> rankings = topTen.getRankings();
    

    As you can see it is really straight-forward and intuitive in its use.

    Figure 5: The Rankings class ranks Rankable objects descendingly according to their natural order, i.e. from largest to smallest. The example above shows a Rankings instance with a maximum size of 10 and a current size of 8.

    The concrete class implementing Rankable is RankableObjectWithFields. The bolt IntermediateRankingsBolt, for instance, creates Rankables from incoming data tuples via a factory method of this class:

    IntermediateRankingsBolt.java
    1
    2
    3
    4
    5
    
    @Override
    void updateRankingsWithTuple(Tuple tuple) {
      Rankable rankable = RankableObjectWithFields.from(tuple);
      super.getRankings().updateWith(rankable);
    }
    

    Have a look at Rankings, Rankable and RankableObjectWithFields for details. If you run into a situation where you have to implement classes like these yourself, make sure you follow good engineering practice and add standard methods such as equals() and hashCode() as well to your data structures.

    Implementing the Rolling Top Words Topology

    So where are we? In the sections above we have already discussed a number of Java classes but not even a single one of them has been directly related to Storm. It’s about time that we start writing some Storm code!

    In the following sections I will describe the Storm components that make up the Rolling Top Words topology. When reading the sections keep in mind that the “words” in this topology represent the topics that are currently being mentioned by the users in our imaginary system.

    Overview of the Topology

    The high-level view of the Rolling Top Words topology is shown in the figure below.

    Figure 6: The Rolling Top Words topology consists of instances of TestWordSpout, RollingCountBolt, IntermediateRankingsBolt and TotalRankingsBolt. The length of the sliding window (in secs) as well as the various emit frequencies (in secs) are just example values – depending on your use case you would, for instance, prefer to have a sliding window of five minutes and emit the latest rolling counts every minute.

    The main responsibilities are split as follows:

    1. In the first layer the topology runs many TestWordSpout instances in parallel to simulate the load of incoming data – in our case this would be the names of the topics (represented as words) that are currently being mentioned by our users.
    2. The second layer comprises multiple instances of RollingCountBolt, which perform a rolling count of incoming words/topics.
    3. The third layer uses multiple instances of IntermediateRankingsBolt (“I.R. Bolt” in the figure) to distribute the load of pre-aggregating the various incoming rolling counts into intermediate rankings. Hadoop users will see a strong similarity here to the functionality of a combiner in Hadoop.
    4. Lastly, there is the final step in the topology. Here, a single instance of TotalRankingsBolt aggregates the incoming intermediate rankings into a global, consolidated total ranking. The output of this bolt are the currently trending topics in the system. These trending topics can then be used by downstream data consumers to provide all the cool user-facing and backend features you want to have in your platform.

    In code the topology wiring looks as follows in RollingTopWords:

    RollingTopWords.java
    1
    2
    3
    4
    5
    6
    7
    
    builder.setSpout(spoutId, new TestWordSpout(), 2);
    builder.setBolt(counterId, new RollingCountBolt(9, 3), 3)
                .fieldsGrouping(spoutId, new Fields("word"));
    builder.setBolt(intermediateRankerId, new IntermediateRankingsBolt(TOP_N), 2)
                .fieldsGrouping(counterId, new Fields("obj"));
    builder.setBolt(totalRankerId, new TotalRankingsBolt(TOP_N))
                .globalGrouping(intermediateRankerId);
    
    Note: The integer parameters of the setSpout() and setBolt() methods (do not confuse them with the integer parameters of the bolt constructors) configure the parallelism of the Storm components. See my article Understanding the Parallelism of a Storm Topology for details.

    TestWordSpout

    The only spout we will be using is the TestWordSpout that is part of backtype.storm.testing package of Storm itself. I will not cover the spout in detail because it is a trivial class. The only thing it does is to select a random word from a fixed list of five words (“nathan”, “mike”, “jackson”, “golda”, “bertels”) and emit that word to the downstream topology every 100ms. For the sake of this article, we consider these words to be our “topics”, of which we want to identify the trending ones.

    Note: Because TestWordSpout selects its output words at random (and each word having the same probability of being selected) in most cases the counts of the various words are pretty close to each other. This is ok for example code such as ours. In a production setting though you most likely want to generate “better” simulation data.

    The spout’s output can be visualized as follows. Note that the @XXXms milliseconds timeline is not part of the actual output.

    1
    2
    3
    4
    5
    6
    7
    8
    
    @100ms: nathan
    @200ms: golda
    @300ms: golda
    @400ms: jackson
    @500ms: mike
    @600ms: nathan
    @700ms: bertels
    ...

    Excursus: Tick Tuples in Storm 0.8+

    A new and very helpful (read: awesome) feature of Storm 0.8 is the so-called tick tuple. Whenever you want a spout or bolt execute a task at periodic intervals – in other words, you want to trigger an event or activity – using a tick tuple is normally the best practice.

    Nathan Marz described tick tuples in the Storm 0.8 announcement as follows:

    Tick tuples: It’s common to require a bolt to “do something” at a fixed interval, like flush writes to a database. Many people have been using variants of a ClockSpout to send these ticks. The problem with a ClockSpout is that you can’t internalize the need for ticks within your bolt, so if you forget to set up your bolt correctly within your topology it won’t work correctly. 0.8.0 introduces a new “tick tuple” config that lets you specify the frequency at which you want to receive tick tuples via the “topology.tick.tuple.freq.secs” component-specific config, and then your bolt will receive a tuple from the __system component and __tick stream at that frequency.

    Nathan Marz on the Storm mailing list groups.google.com/forum/#!msg/…

    Here is how you configure a bolt/spout to receive tick tuples every 10 seconds:

    Configuring a bolt/spout to receive tick tuples every 10 seconds
    1
    2
    3
    4
    5
    6
    7
    
    @Override
    public Map<String, Object> getComponentConfiguration() {
      Config conf = new Config();
      int tickFrequencyInSeconds = 10;
      conf.put(Config.TOPOLOGY_TICK_TUPLE_FREQ_SECS, tickFrequencyInSeconds);
      return conf;
    }
    

    Usually you will want to add a conditional switch to the component’s execute method to tell tick tuples and “normal” tuples apart:

    Telling tick tuples and normal tuples apart
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    
    @Override
    public void execute(Tuple tuple) {
      if (isTickTuple(tuple)) {
        // now you can trigger e.g. a periodic activity
      }
      else {
        // do something with the normal tuple
      }
    }
    private static boolean isTickTuple(Tuple tuple) {
      return tuple.getSourceComponent().equals(Constants.SYSTEM_COMPONENT_ID)
        && tuple.getSourceStreamId().equals(Constants.SYSTEM_TICK_STREAM_ID);
    }
    

    Be aware that tick tuples are sent to bolts/spouts just like “regular” tuples, which means they will be queued behind other tuples that a bolt/spout is about to process via its execute() or nextTuple() method, respectively. As such the time interval you configure for tick tuples is, in practice, served on a “best effort” basis. For instance, if a bolt is suffering from high execution latency – e.g. due to being overwhelmed by the incoming rate of regular, non-tick tuples – then you will observe that the periodic activities implemented in the bolt will get triggered later than expected.

    I hope that, like me, you can appreciate the elegance of solely using Storm’s existing primitives to implement the new tick tuple feature. :-)

    RollingCountBolt

    This bolt performs rolling counts of incoming objects, i.e. sliding window based counting. Accordingly it uses the SlidingWindowCounter class described above to achieve this. In contrast to the old implementation only this bolt (more correctly: the instances of this bolt that run as Storm tasks) is interacting with the SlidingWindowCounter data structure. Each instance of the bolt has its own private SlidingWindowCounter field, which eliminates the need for any custom inter-thread communication and synchronization.

    The bolt combines the previously described tick tuples (that trigger at fix intervals in time) with the time-agnostic behavior of SlidingWindowCounter to achieve time-based sliding window counting. Whenever the bolt receives a tick tuple, it will advance the window of its private SlidingWindowCounter instance and emit its latest rolling counts. In the case of normal tuples it will simply count the object and ack the tuple.

    RollingCountBolt
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    
    @Override
    public void execute(Tuple tuple) {
      if (TupleHelpers.isTickTuple(tuple)) {
        LOG.info("Received tick tuple, triggering emit of current window counts");
        emitCurrentWindowCounts();
      }
      else {
        countObjAndAck(tuple);
      }
    }
    private void emitCurrentWindowCounts() {
      Map<Object, Long> counts = counter.getCountsThenAdvanceWindow();
      ...
      emit(counts, actualWindowLengthInSeconds);
    }
    private void emit(Map<Object, Long> counts) {
      for (Entry<Object, Long> entry : counts.entrySet()) {
        Object obj = entry.getKey();
        Long count = entry.getValue();
        collector.emit(new Values(obj, count));
      }
    }
    private void countObjAndAck(Tuple tuple) {
      Object obj = tuple.getValue(0);
      counter.incrementCount(obj);
      collector.ack(tuple);
    }
    

    That’s all there is to it! The new tick tuples in Storm 0.8 and the cleaned code of the bolt and its collaborators also make the code much more testable (the new code of this bolt has 98% test coverage). Compare the code above to the old implementation of the bolt and decide for yourself which one you’d prefer adapting or maintaining:

    RollingCountObjects BEFORE Storm tick tuples and refactoring
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    
    public void prepare(Map stormConf, TopologyContext context, OutputCollector collector) {
        _collector = collector;
        cleaner = new Thread(new Runnable() {
            public void run() {
                Integer lastBucket = currentBucket(_numBuckets);
                while(true) {
                  int currBucket = currentBucket(_numBuckets);
                  if(currBucket!=lastBucket) {
                      int bucketToWipe = (currBucket + 1) % _numBuckets;
                      synchronized(_objectCounts) {
                          Set objs = new HashSet(_objectCounts.keySet());
                          for (Object obj: objs) {
                            long[] counts = _objectCounts.get(obj);
                            long currBucketVal = counts[bucketToWipe];
                            counts[bucketToWipe] = 0;
                            long total = totalObjects(obj);
                            if(currBucketVal!=0) {
                                _collector.emit(new Values(obj, total));
                            }
                            if(total==0) {
                                _objectCounts.remove(obj);
                            }
                          }
                      }
                      lastBucket = currBucket;
                  }
                  long delta = millisPerBucket(_numBuckets) - (System.currentTimeMillis() % millisPerBucket(_numBuckets));
                  Utils.sleep(delta);
                }
            }
        });
        cleaner.start();
    }
    public void execute(Tuple tuple) {
        Object obj = tuple.getValue(0);
        int bucket = currentBucket(_numBuckets);
        synchronized(_objectCounts) {
            long[] curr = _objectCounts.get(obj);
            if(curr==null) {
                curr = new long[_numBuckets];
                _objectCounts.put(obj, curr);
            }
            curr[bucket]++;
            _collector.emit(new Values(obj, totalObjects(obj)));
            _collector.ack(tuple);
        }
    }
    

    Unit Test Example

    Since I mentioned unit testing a couple of times in the previous section, let me briefly discuss this point in further detail. I implemented the unit tests with TestNG, Mockito and FEST-Assert. Here is an example unit test for RollingCountBolt, taken from RollingCountBoltTest.

    Example unit test
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    
    @Test
    public void shouldEmitNothingIfNoObjectHasBeenCountedYetAndTickTupleIsReceived() {
      // given
      Tuple tickTuple = MockTupleHelpers.mockTickTuple();
      RollingCountBolt bolt = new RollingCountBolt();
      Map conf = mock(Map.class);
      TopologyContext context = mock(TopologyContext.class);
      OutputCollector collector = mock(OutputCollector.class);
      bolt.prepare(conf, context, collector);
      // when
      bolt.execute(tickTuple);
      // then
      verifyZeroInteractions(collector);
    }
    

    AbstractRankerBolt

    This abstract bolt provides the basic behavior of bolts that rank objects according to their natural order. It uses the template method design pattern for its execute() method to allow actual bolt implementations to specify how incoming tuples are processed, i.e. how the objects embedded within those tuples are retrieved and counted.

    This bolt has a private Rankings field to rank incoming tuples (those must contain Rankable objects, of course) according to their natural order.

    AbstractRankerBolt
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    
    // This method functions as a template method (design pattern).
    @Override
    public final void execute(Tuple tuple, BasicOutputCollector collector) {
      if (TupleHelpers.isTickTuple(tuple)) {
        getLogger().info("Received tick tuple, triggering emit of current rankings");
        emitRankings(collector);
      }
      else {
        updateRankingsWithTuple(tuple);
      }
    }
    abstract void updateRankingsWithTuple(Tuple tuple);
    

    The two actual implementations used in the Rolling Top Words topology, IntermediateRankingsBolt and TotalRankingsBolt, only need to implement the updateRankingsWithTuple() method.

    IntermediateRankingsBolt

    This bolt extends AbstractRankerBolt and ranks incoming objects by their count in order to produce intermediate rankings. This type of aggregation is similar to the functionality of a combiner in Hadoop. The topology runs many of such intermediate ranking bolts in parallel to distribute the load of processing the incoming rolling counts from the RollingCountBolt instances.

    This bolt only needs to override updateRankingsWithTuple() of AbstractRankerBolt:

    IntermediateRankingsBolt
    1
    2
    3
    4
    5
    
    @Override
    void updateRankingsWithTuple(Tuple tuple) {
      Rankable rankable = RankableObjectWithFields.from(tuple);
      super.getRankings().updateWith(rankable);
    }
    

    TotalRankingsBolt

    This bolt extends AbstractRankerBolt and merges incoming intermediate Rankings emitted by the IntermediateRankingsBolt instances.

    Like IntermediateRankingsBolt, this bolt only needs to override the updateRankingsWithTuple() method:

    TotalRankingsBolt
    1
    2
    3
    4
    5
    
    @Override
    void updateRankingsWithTuple(Tuple tuple) {
      Rankings rankingsToBeMerged = (Rankings) tuple.getValue(0);
      super.getRankings().updateWith(rankingsToBeMerged);
    }
    

    Since this bolt is responsible for creating a global, consolidated ranking of currently trending topics, the topology must run only a single instance of TotalRankingsBolt. In other words, it must be a singleton in the topology.

    The bolt’s current code in storm-starter does not enforce this behavior though – instead it relies on the RollingTopWords class to configure the bolt’s parallelism correctly (if you ask yourself why it doesn’t: that was simply oversight on my part, oops). If you want to improve that, you can provide a so-called per-component Storm configuration for this bolt that sets its maximum task parallelism to 1:

    TotalRankingsBolt
    1
    2
    3
    4
    5
    6
    7
    8
    
    @Override
    public Map<String, Object> getComponentConfiguration() {
      Map<String, Object> conf = new HashMap<String, Object>();
      conf.put(Config.TOPOLOGY_TICK_TUPLE_FREQ_SECS, emitFrequencyInSeconds);
      // run only a single instance of this bolt in the Storm topology
      conf.setMaxTaskParallelism(1);
      return conf;
    }
    

    RollingTopWords

    The class RollingTopWords ties all the previously discussed code pieces together. It implements the actual Storm topology, configures spouts and bolts, wires them together and launches the topology in local mode (Storm’s local mode is similar to a pseudo-distributed, single-node Hadoop cluster).

    By default, it will produce the top 5 rolling words (our trending topics) and run for one minute before terminating. If you want to twiddle with the topology’s configuration settings, here are the most important:

    • Configure the number of generated trending topics by setting the TOP_N constant in RollingTopWords.
    • Configure the length and emit frequencies (both in seconds) for the sliding window counting in the constructor of RollingCountBolt in RollingTopWords#wireTopology().
    • Similarly, configure the emit frequencies (in seconds) of the ranking bolts by using their corresponding constructors.
    • Configure the parallelism of the topology by setting the parallelism_hint parameter of each bolt and spout accordingly.

    Apart from this there is nothing special about this class. And because we have already seen the most important code snippet from this class in the section Overview of the Topology I will not describe it any further here.

    Running the Rolling Top Words topology

    Update 2014-06-04: I updated the instructions below based on the latest version of storm-starter, which is now part of the official Storm project.

    Now that you know how the trending topics Storm code works it is about time we actually launch the topology! The topology is configured to run in local mode, which means you can just grab the code to your development box and launch it right away. You do not need any special Storm cluster installation or similar setup.

    First you must checkout the latest code of the storm-starter project from GitHub:

    1
    2
    
    $ git clone git@github.com:apache/incubator-storm.git
    $ cd incubator-storm
    

    Then you must build and install the (latest) Storm jars locally, see the storm-starter README:

    1
    2
    
    # Must be run from the top-level directory of the Storm code repository
    $ mvn clean install -DskipTests=true
    

    Now you can compile and run the RollingTopWords topology:

    1
    2
    
    $ cd examples/storm-starter
    $ mvn compile exec:java -Dstorm.topology=storm.starter.RollingTopWords
    

    By default the topology will run for one minute and then terminate automatically.

    Example Logging Output

    Here is some example logging output of the topology. The first colum is the current time in milliseconds since the topology was started (i.e. it is 0 at the very beginning). The second colum is the ID of the thread that logged the message. I deliberately removed some entries in the log flow to make the output easier to read. For this reason please take a close look on the timestamps (first column) when you want to compare the various example outputs below.

    Also, the Rolling Top Words topology has debugging output enabled. This means that Storm itself will by default log information such as what data a bolt/spout has emitted. For that reason you will see seemingly duplicate lines in the logs below.

    Lastly, to make the logging output easier to read here is some information about the various thread IDs in this example run:

    Thread IDJava Class
    Thread-37TestWordSpout
    Thread-39TestWordSpout
    Thread-19RollingCountBolt
    Thread-21RollingCountBolt
    Thread-25RollingCountBolt
    Thread-31IntermediateRankingsBolt
    Thread-33IntermediateRankingsBolt
    Thread-27TotalRankingsBolt
    Note: The Rolling Top Words code in the storm-starter repository runs more instances of the various spouts and bolts than the code used in this article. I downscaled the settings only to make the figures etc. easier to read. This means your own logging output will look slightly different.

    The topology has just started to run. The spouts generate their first output messages:

    1
    2
    3
    4
    5
    6
    
    2056 [Thread-37] INFO  backtype.storm.daemon.task  - Emitting: wordGenerator default [golda]
    2057 [Thread-19] INFO  backtype.storm.daemon.executor  - Processing received message source: wordGenerator:11, stream: default, id: {}, [golda]
    2063 [Thread-39] INFO  backtype.storm.daemon.task  - Emitting: wordGenerator default [nathan]
    2064 [Thread-25] INFO  backtype.storm.daemon.executor  - Processing received message source: wordGenerator:12, stream: default, id: {}, [nathan]
    2069 [Thread-37] INFO  backtype.storm.daemon.task  - Emitting: wordGenerator default [mike]
    2069 [Thread-21] INFO  backtype.storm.daemon.executor  - Processing received message source: wordGenerator:13, stream: default, id: {}, [mike]

    The three RollingCountBolt instances start to emit their first sliding window counts:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    
    4765 [Thread-19] INFO  backtype.storm.daemon.executor  - Processing received message source: __system:-1, stream: __tick, id: {}, [3]
    4765 [Thread-19] INFO  storm.starter.bolt.RollingCountBolt  - Received tick tuple, triggering emit of current window counts
    4765 [Thread-25] INFO  backtype.storm.daemon.executor  - Processing received message source: __system:-1, stream: __tick, id: {}, [3]
    4765 [Thread-25] INFO  storm.starter.bolt.RollingCountBolt  - Received tick tuple, triggering emit of current window counts
    4766 [Thread-21] INFO  backtype.storm.daemon.executor  - Processing received message source: __system:-1, stream: __tick, id: {}, [3]
    4766 [Thread-21] INFO  storm.starter.bolt.RollingCountBolt  - Received tick tuple, triggering emit of current window counts
    4766 [Thread-19] INFO  backtype.storm.daemon.task  - Emitting: counter default [golda, 24, 2]
    4766 [Thread-25] INFO  backtype.storm.daemon.task  - Emitting: counter default [nathan, 33, 2]
    4766 [Thread-21] INFO  backtype.storm.daemon.task  - Emitting: counter default [mike, 27, 2]

    The two IntermediateRankingsBolt instances emit their intermediate rankings:

    1
    2
    3
    4
    
    5774 [Thread-31] INFO  backtype.storm.daemon.task  - Emitting: intermediateRanker default [[[mike|27|2], [golda|24|2]]]
    5774 [Thread-33] INFO  backtype.storm.daemon.task  - Emitting: intermediateRanker default [[[bertels|31|2], [jackson|19|2]]]
    5774 [Thread-31] INFO  storm.starter.bolt.IntermediateRankingsBolt  - Rankings: [[mike|27|2], [golda|24|2]]
    5774 [Thread-33] INFO  storm.starter.bolt.IntermediateRankingsBolt  - Rankings: [[bertels|31|2], [jackson|19|2]]

    The single TotalRankingsBolt instance emits its global rankings:

    1
    2
    3
    4
    5
    6
    
    3765 [Thread-27] INFO  storm.starter.bolt.TotalRankingsBolt  - Rankings: []
    5767 [Thread-27] INFO  storm.starter.bolt.TotalRankingsBolt  - Rankings: []
    7768 [Thread-27] INFO  storm.starter.bolt.TotalRankingsBolt  - Rankings: [[nathan|33|2], [bertels|31|2], [mike|27|2], [golda|24|2], [jackson|19|2]]
    9770 [Thread-27] INFO  storm.starter.bolt.TotalRankingsBolt  - Rankings: [[bertels|76|5], [nathan|58|5], [mike|49|5], [golda|24|2], [jackson|19|2]]
    11771 [Thread-27] INFO  storm.starter.bolt.TotalRankingsBolt  - Rankings: [[bertels|76|5], [nathan|58|5], [jackson|52|5], [mike|49|5], [golda|49|5]]
    13772 [Thread-27] INFO  storm.starter.bolt.TotalRankingsBolt  - Rankings: [[bertels|110|8], [nathan|85|8], [golda|85|8], [jackson|83|8], [mike|71|8]]
    Note: During the first few seconds after startup you will observe that IntermediateRankingsBolt and TotalRankingsBolt instances will emit empty rankings. This is normal and the expected behavior – during the first seconds the RollingCountBolt instances will collect incoming words/topics and fill their sliding windows before emitting the first rolling counts to the IntermediateRankingsBolt instances. The same kind of thing happens for the combination of IntermediateBolt instances and the TotalRankingsBolt instance. This is an important behavior of the code that must be understood by downstream data consumers of the trending topics emitted by the topology.

    What I Did Not Cover

    I introduced a new feature to the Rolling Top Words code that I contributed back to storm-starter. This feature is a metric that tracks the difference between the configured length of the sliding window (in seconds) and the actual window length as seen in the emitted output data.

    1
    
    4763 [Thread-25] WARN  storm.starter.bolt.RollingCountBolt  - Actual window length is 2 seconds when it should be 9 seconds (you can safely ignore this warning during the startup phase)

    This metric provides downstream data consumers with additional meta data, namely the time range that a data tuple actually covers. It is a nifty addition that will make the life of your fellow data scientists easier. Typically, you will see a difference between configured and actual window length a) during startup for the reasons mentioned above and b) when your machines are under high load and therefore not respond perfectly in time. I omitted the discussion of this new feature to prevent this article from getting too long.

    Also, there are some minor changes in my own code that I did not contribute back to storm-starter because I did not want to introduce too many changes at once (such as a refactored TestWordSpout class).

    Summary

    In this article I described how to implement a distributed, real-time trending topics algorithm in Storm. It uses the latest features available in Storm 0.8 (namely tick tuples) and should be a good starting point for anyone trying to implement such an algorithm for their own application. The new code is now available in the official storm-starter repository, so feel free to take a closer look.

    You might ask whether there is a use of a distributed sliding window analysis beyond the use case I presented in this article. And for sure there is. The sliding window analysis described here applies to a broader range of problems than computing trending topics. Another typical area of application is real-time infrastructure monitoring, for instance to identify broken servers by detecting a surge of errors originating from problematic machines. A similar use case is identifying attacks against your technical infrastructure, notably flood-type DDoS attacks. All of these scenarios can benefit from sliding window analyses of incoming real-time data through tools such as Storm.

    If you think the starter code can be improved further, please contribute your changes back to the storm-starter component in the official Storm project.

    Related Links

     

    展开全文
  • 从Google Chrome网上应用店一键式安装 :down_arrow_selector: :down_arrow_selector: :down_arrow_selector: 本地开发 用于调试chrome特性,代码编译成功后手动刷新页面方可实行 安装依赖 npm i 编译代码 npm run ...
  • app tour tour track_changes track_changes translate translate trending_down trending_down trending_flat trending_flat trending_up trending_up turned_in turned_in turned_in_not turned_in_not ...

    3d_rotation

    3d_rotation

    accessibility

    accessibility

    accessibility_new

    accessibility_new

    accessible

    accessible

    accessible_forward

    accessible_forward

    account_balance

    account_balance

    account_balance_wallet

    account_balance_wallet

    account_box

    account_box

    account_circle

    account_circle

    add_shopping_cart

    add_shopping_cart

    add_task

    add_task

    addchart

    addchart

    admin_panel_settings

    admin_panel_settings

    alarm

    alarm

    alarm_add

    alarm_add

    alarm_off

    alarm_off

    alarm_on

    alarm_on

    all_inbox

    all_inbox

    all_out

    all_out

    analytics

    analytics

    anchor

    anchor

    android

    android

    announcement

    announcement

    api

    api

    app_blocking

    app_blocking

    arrow_circle_down

    arrow_circle_down

    arrow_circle_up

    arrow_circle_up

    arrow_right_alt

    arrow_right_alt

    article

    article

    aspect_ratio

    aspect_ratio

    assessment

    assessment

    assignment

    assignment

    assignment_ind

    assignment_ind

    assignment_late

    assignment_late

    assignment_return

    assignment_return

    assignment_returned

    assignment_returned

    assignment_turned_in

    assignment_turned_in

    autorenew

    autorenew

    backup

    backup

    backup_table

    backup_table

    batch_prediction

    batch_prediction

    book

    book

    book_online

    book_online

    bookmark

    bookmark

    bookmark_border

    bookmark_border

    bookmarks

    bookmarks

    bug_report

    bug_report

    build

    build

    build_circle

    build_circle

    cached

    cached

    calendar_today

    calendar_today

    calendar_view_day

    calendar_view_day

    camera_enhance

    camera_enhance

    cancel_schedule_send

    cancel_schedule_send

    card_giftcard

    card_giftcard

    card_membership

    card_membership

    card_travel

    card_travel

    change_history

    change_history

    check_circle

    check_circle

    check_circle_outline

    check_circle_outline

    chrome_reader_mode

    chrome_reader_mode

    class

    class

    close_fullscreen

    close_fullscreen

    code

    code

    comment_bank

    comment_bank

    commute

    commute

    compare_arrows

    compare_arrows

    contact_page

    contact_page

    contact_support

    contact_support

    contactless

    contactless

    copyright

    copyright

    credit_card

    credit_card

    dashboard

    dashboard

    date_range

    date_range

    delete

    delete

    delete_forever

    delete_forever

    delete_outline

    delete_outline

    description

    description

    disabled_by_default

    disabled_by_default

    dns

    dns

    done

    done

    done_all

    done_all

    done_outline

    done_outline

    donut_large

    donut_large

    donut_small

    donut_small

    drag_indicator

    drag_indicator

    dynamic_form

    dynamic_form

    eco

    eco

    eject

    eject

    euro_symbol

    euro_symbol

    event

    event

    event_seat

    event_seat

    exit_to_app

    exit_to_app

    explore

    explore

    explore_off

    explore_off

    extension

    extension

    face

    face

    fact_check

    fact_check

    favorite

    favorite

    favorite_border

    favorite_border

    feedback

    feedback

    filter_alt

    filter_alt

    find_in_page

    find_in_page

    find_replace

    find_replace

    fingerprint

    fingerprint

    flaky

    flaky

    flight_land

    flight_land

    flight_takeoff

    flight_takeoff

    flip_to_back

    flip_to_back

    flip_to_front

    flip_to_front

    g_translate

    g_translate

    gavel

    gavel

    get_app

    get_app

    gif

    gif

    grade

    grade

    grading

    grading

    group_work

    group_work

    help

    help

    help_center

    help_center

    help_outline

    help_outline

    highlight_alt

    highlight_alt

    highlight_off

    highlight_off

    history

    history

    history_toggle_off

    history_toggle_off

    home

    home

    horizontal_split

    horizontal_split

    hourglass_disabled

    hourglass_disabled

    hourglass_empty

    hourglass_empty

    hourglass_full

    hourglass_full

    http

    http

    https

    https

    important_devices

    important_devices

    info

    info

    input

    input

    integration_instructions

    integration_instructions

    invert_colors

    invert_colors

    label

    label

    label_important

    label_important

    label_off

    label_off

    language

    language

    launch

    launch

    leaderboard

    leaderboard

    line_style

    line_style

    line_weight

    line_weight

    list

    list

    lock

    lock

    lock_open

    lock_open

    login

    login

    loyalty

    loyalty

    markunread_mailbox

    markunread_mailbox

    maximize

    maximize

    mediation

    mediation

    minimize

    minimize

    model_training

    model_training

    next_plan

    next_plan

    not_accessible

    not_accessible

    not_started

    not_started

    note_add

    note_add

    offline_bolt

    offline_bolt

    offline_pin

    offline_pin

    online_prediction

    online_prediction

    opacity

    opacity

    open_in_browser

    open_in_browser

    open_in_full

    open_in_full

    open_in_new

    open_in_new

    open_with

    open_with

    outbond

    outbond

    outlet

    outlet

    pageview

    pageview

    pan_tool

    pan_tool

    payment

    payment

    pending

    pending

    pending_actions

    pending_actions

    perm_camera_mic

    perm_camera_mic

    perm_contact_calendar

    perm_contact_calendar

    perm_data_setting

    perm_data_setting

    perm_device_information

    perm_device_information

    perm_identity

    perm_identity

    perm_media

    perm_media

    perm_phone_msg

    perm_phone_msg

    perm_scan_wifi

    perm_scan_wifi

    pets

    pets

    picture_in_picture

    picture_in_picture

    picture_in_picture_alt

    picture_in_picture_alt

    plagiarism

    plagiarism

    play_for_work

    play_for_work

    polymer

    polymer

    power_settings_new

    power_settings_new

    pregnant_woman

    pregnant_woman

    preview

    preview

    print

    print

    privacy_tip

    privacy_tip

    published_with_changes

    published_with_changes

    query_builder

    query_builder

    question_answer

    question_answer

    quickreply

    quickreply

    receipt

    receipt

    record_voice_over

    record_voice_over

    redeem

    redeem

    remove_shopping_cart

    remove_shopping_cart

    reorder

    reorder

    report_problem

    report_problem

    request_page

    request_page

    restore

    restore

    restore_from_trash

    restore_from_trash

    restore_page

    restore_page

    room

    room

    rounded_corner

    rounded_corner

    rowing

    rowing

    rule

    rule

    schedule

    schedule

    search

    search

    search_off

    search_off

    settings

    settings

    settings_applications

    settings_applications

    settings_backup_restore

    settings_backup_restore

    settings_bluetooth

    settings_bluetooth

    settings_brightness

    settings_brightness

    settings_cell

    settings_cell

    settings_ethernet

    settings_ethernet

    settings_input_antenna

    settings_input_antenna

    settings_input_component

    settings_input_component

    settings_input_composite

    settings_input_composite

    settings_input_hdmi

    settings_input_hdmi

    settings_input_svideo

    settings_input_svideo

    settings_overscan

    settings_overscan

    settings_phone

    settings_phone

    settings_power

    settings_power

    settings_remote

    settings_remote

    settings_voice

    settings_voice

    shop

    shop

    shop_two

    shop_two

    shopping_bag

    shopping_bag

    shopping_basket

    shopping_basket

    shopping_cart

    shopping_cart

    smart_button

    smart_button

    source

    source

    speaker_notes

    speaker_notes

    speaker_notes_off

    speaker_notes_off

    spellcheck

    spellcheck

    star_rate

    star_rate

    stars

    stars

    sticky_note_2

    sticky_note_2

    store

    store

    subject

    subject

    subtitles_off

    subtitles_off

    supervised_user_circle

    supervised_user_circle

    supervisor_account

    supervisor_account

    support

    support

    swap_horiz

    swap_horiz

    swap_horizontal_circle

    swap_horizontal_circle

    swap_vert

    swap_vert

    swap_vertical_circle

    swap_vertical_circle

    sync_alt

    sync_alt

    system_update_alt

    system_update_alt

    tab

    tab

    tab_unselected

    tab_unselected

    table_view

    table_view

    text_rotate_up

    text_rotate_up

    text_rotate_vertical

    text_rotate_vertical

    text_rotation_angledown

    text_rotation_angledown

    text_rotation_angleup

    text_rotation_angleup

    text_rotation_down

    text_rotation_down

    text_rotation_none

    text_rotation_none

    theaters

    theaters

    thumb_down

    thumb_down

    thumb_up

    thumb_up

    thumbs_up_down

    thumbs_up_down

    timeline

    timeline

    toc

    toc

    today

    today

    toll

    toll

    touch_app

    touch_app

    tour

    tour

    track_changes

    track_changes

    translate

    translate

    trending_down

    trending_down

    trending_flat

    trending_flat

    trending_up

    trending_up

    turned_in

    turned_in

    turned_in_not

    turned_in_not

    unpublished

    unpublished

    update

    update

    update_disabled

    update_disabled

    upgrade

    upgrade

    verified

    verified

    verified_user

    verified_user

    vertical_split

    vertical_split

    view_agenda

    view_agenda

    view_array

    view_array

    view_carousel

    view_carousel

    view_column

    view_column

    view_day

    view_day

    view_headline

    view_headline

    view_list

    view_list

    view_module

    view_module

    view_quilt

    view_quilt

    view_sidebar

    view_sidebar

    view_stream

    view_stream

    view_week

    view_week

    visibility

    visibility

    visibility_off

    visibility_off

    voice_over_off

    voice_over_off

    watch_later

    watch_later

    wifi_protected_setup

    wifi_protected_setup

    work

    work

    work_off

    work_off

    work_outline

    work_outline

    wysiwyg

    wysiwyg

    youtube_searched_for

    youtube_searched_for

    zoom_in

    zoom_in

    zoom_out

    zoom_out

    展开全文
  • IconIcon Namemat-icon code 3d_rotation 3d rotation icon <mat-icon> 3d_rotation </mat-icon> accessibility accessibility icon <mat-icon>.../mat-icon...
    IconIcon Namemat-icon code
    3d_rotation3d rotation icon<mat-icon> 3d_rotation </mat-icon>
    accessibilityaccessibility icon<mat-icon>accessibility</mat-icon>
    accessibility_newaccessibility new icon<mat-icon>accessibility_new</mat-icon>
    accessibleaccessible icon<mat-icon>accessible</mat-icon>
    accessible_forwardaccessible forward icon<mat-icon>accessible_forward</mat-icon>
    account_balanceaccount balance icon<mat-icon>account_balance</mat-icon>
    account_balance_walletaccount balance wallet icon<mat-icon>account_balance_wallet</mat-icon>
    account_boxaccount box icon<mat-icon>account_box</mat-icon>
    account_circleaccount circle icon<mat-icon>account_circle</mat-icon>
    add_shopping_cartadd shopping cart icon<mat-icon>add_shopping_cart</mat-icon>
    alarmalarm icon<mat-icon>alarm</mat-icon>
    alarm_addalarm add icon<mat-icon>alarm_add</mat-icon>
    alarm_offalarm off icon<mat-icon>alarm_off</mat-icon>
    alarm_onalarm on icon<mat-icon>alarm_on</mat-icon>
    all_outall out icon<mat-icon>all_out</mat-icon>
    androidandroid icon<mat-icon>android</mat-icon>
    announcementannouncement icon<mat-icon>announcement</mat-icon>
    arrow_right_altarrow right alt icon<mat-icon>arrow_right_alt</mat-icon>
    aspect_ratioaspect ratio icon<mat-icon>aspect_ratio</mat-icon>
    assessmentassessment icon<mat-icon>assessment</mat-icon>
    assignmentassignment icon<mat-icon>assignment</mat-icon>
    assignment_indassignment ind icon<mat-icon>assignment_ind</mat-icon>
    assignment_lateassignment late icon<mat-icon>assignment_late</mat-icon>
    assignment_returnassignment return icon<mat-icon>assignment_return</mat-icon>
    assignment_returnedassignment returned icon<mat-icon>assignment_returned</mat-icon>
    assignment_turned_inassignment turned in icon<mat-icon>assignment_turned_in</mat-icon>
    autorenewautorenew icon<mat-icon>autorenew</mat-icon>
    backupbackup icon<mat-icon>backup</mat-icon>
    bookbook icon<mat-icon>book</mat-icon>
    bookmarkbookmark icon<mat-icon>bookmark</mat-icon>
    bookmark_borderbookmark border icon<mat-icon>bookmark_border</mat-icon>
    bug_reportbug report icon<mat-icon>bug_report</mat-icon>
    buildbuild icon<mat-icon>build</mat-icon>
    cachedcached icon<mat-icon>cached</mat-icon>
    calendar_todaycalendar today icon<mat-icon>calendar_today</mat-icon>
    calendar_view_daycalendar view day icon<mat-icon>calendar_view_day</mat-icon>
    camera_enhancecamera enhance icon<mat-icon>camera_enhance</mat-icon>
    card_giftcardcard giftcard icon<mat-icon>card_giftcard</mat-icon>
    card_membershipcard membership icon<mat-icon>card_membership</mat-icon>
    card_travelcard travel icon<mat-icon>card_travel</mat-icon>
    change_historychange history icon<mat-icon>change_history</mat-icon>
    check_circlecheck circle icon<mat-icon>check_circle</mat-icon>
    check_circle_outlinecheck circle outline icon<mat-icon>check_circle_outline</mat-icon>
    chrome_reader_modechrome reader mode icon<mat-icon>chrome_reader_mode</mat-icon>
    classclass icon<mat-icon>class</mat-icon>
    codecode icon<mat-icon>code</mat-icon>
    commutecommute icon<mat-icon>commute</mat-icon>
    compare_arrowscompare arrows icon<mat-icon>compare_arrows</mat-icon>
    contact_supportcontact support icon<mat-icon>contact_support</mat-icon>
    copyrightcopyright icon<mat-icon>copyright</mat-icon>
    credit_cardcredit card icon<mat-icon>credit_card</mat-icon>
    dashboarddashboard icon<mat-icon>dashboard</mat-icon>
    date_rangedate range icon<mat-icon>date_range</mat-icon>
    deletedelete icon<mat-icon>delete</mat-icon>
    delete_foreverdelete forever icon<mat-icon>delete_forever</mat-icon>
    delete_outlinedelete outline icon<mat-icon>delete_outline</mat-icon>
    descriptiondescription icon<mat-icon>description</mat-icon>
    dnsdns icon<mat-icon>dns</mat-icon>
    donedone icon<mat-icon>done</mat-icon>
    done_alldone all icon<mat-icon>done_all</mat-icon>
    done_outlinedone outline icon<mat-icon>done_outline</mat-icon>
    donut_largedonut large icon<mat-icon>donut_large</mat-icon>
    donut_smalldonut small icon<mat-icon>donut_small</mat-icon>
    drag_indicatordrag indicator icon<mat-icon>drag_indicator</mat-icon>
    ejecteject icon<mat-icon>eject</mat-icon>
    euro_symboleuro symbol icon<mat-icon>euro_symbol</mat-icon>
    eventevent icon<mat-icon>event</mat-icon>
    event_seatevent seat icon<mat-icon>event_seat</mat-icon>
    exit_to_appexit to app icon<mat-icon>exit_to_app</mat-icon>
    exploreexplore icon<mat-icon>explore</mat-icon>
    extensionextension icon<mat-icon>extension</mat-icon>
    faceface icon<mat-icon>face</mat-icon>
    favoritefavorite icon<mat-icon>favorite</mat-icon>
    favorite_borderfavorite border icon<mat-icon>favorite_border</mat-icon>
    feedbackfeedback icon<mat-icon>feedback</mat-icon>
    find_in_pagefind in page icon<mat-icon>find_in_page</mat-icon>
    find_replacefind replace icon<mat-icon>find_replace</mat-icon>
    fingerprintfingerprint icon<mat-icon>fingerprint</mat-icon>
    flight_landflight land icon<mat-icon>flight_land</mat-icon>
    flight_takeoffflight takeoff icon<mat-icon>flight_takeoff</mat-icon>
    flip_to_backflip to back icon<mat-icon>flip_to_back</mat-icon>
    flip_to_frontflip to front icon<mat-icon>flip_to_front</mat-icon>
    g_translateg translate icon<mat-icon>g_translate</mat-icon>
    gavelgavel icon<mat-icon>gavel</mat-icon>
    get_appget app icon<mat-icon>get_app</mat-icon>
    gifgif icon<mat-icon>gif</mat-icon>
    gradegrade icon<mat-icon>grade</mat-icon>
    group_workgroup work icon<mat-icon>group_work</mat-icon>
    helphelp icon<mat-icon>help</mat-icon>
    help_outlinehelp outline icon<mat-icon>help_outline</mat-icon>
    highlight_offhighlight off icon<mat-icon>highlight_off</mat-icon>
    historyhistory icon<mat-icon>history</mat-icon>
    homehome icon<mat-icon>home</mat-icon>
    horizontal_splithorizontal split icon<mat-icon>horizontal_split</mat-icon>
    hourglass_emptyhourglass empty icon<mat-icon>hourglass_empty</mat-icon>
    hourglass_fullhourglass full icon<mat-icon>hourglass_full</mat-icon>
    httphttp icon<mat-icon>http</mat-icon>
    httpshttps icon<mat-icon>https</mat-icon>
    important_devicesimportant devices icon<mat-icon>important_devices</mat-icon>
    infoinfo icon<mat-icon>info</mat-icon>
    inputinput icon<mat-icon>input</mat-icon>
    invert_colorsinvert colors icon<mat-icon>invert_colors</mat-icon>
    labellabel icon<mat-icon>label</mat-icon>
    label_importantlabel important icon<mat-icon>label_important</mat-icon>
    languagelanguage icon<mat-icon>language</mat-icon>
    launchlaunch icon<mat-icon>launch</mat-icon>
    line_styleline style icon<mat-icon>line_style</mat-icon>
    line_weightline weight icon<mat-icon>line_weight</mat-icon>
    listlist icon<mat-icon>list</mat-icon>
    locklock icon<mat-icon>lock</mat-icon>
    lock_openlock open icon<mat-icon>lock_open</mat-icon>
    loyaltyloyalty icon<mat-icon>loyalty</mat-icon>
    markunread_mailboxmarkunread mailbox icon<mat-icon>markunread_mailbox</mat-icon>
    maximizemaximize icon<mat-icon>maximize</mat-icon>
    minimizeminimize icon<mat-icon>minimize</mat-icon>
    motorcyclemotorcycle icon<mat-icon>motorcycle</mat-icon>
    note_addnote add icon<mat-icon>note_add</mat-icon>
    offline_boltoffline bolt icon<mat-icon>offline_bolt</mat-icon>
    offline_pinoffline pin icon<mat-icon>offline_pin</mat-icon>
    opacityopacity icon<mat-icon>opacity</mat-icon>
    open_in_browseropen in browser icon<mat-icon>open_in_browser</mat-icon>
    open_in_newopen in new icon<mat-icon>open_in_new</mat-icon>
    open_withopen with icon<mat-icon>open_with</mat-icon>
    pageviewpageview icon<mat-icon>pageview</mat-icon>
    pan_toolpan tool icon<mat-icon>pan_tool</mat-icon>
    paymentpayment icon<mat-icon>payment</mat-icon>
    perm_camera_micperm camera mic icon<mat-icon>perm_camera_mic</mat-icon>
    perm_contact_calendarperm contact calendar icon<mat-icon>perm_contact_calendar</mat-icon>
    perm_data_settingperm data setting icon<mat-icon>perm_data_setting</mat-icon>
    perm_device_informationperm device information icon<mat-icon>perm_device_information</mat-icon>
    perm_identityperm identity icon<mat-icon>perm_identity</mat-icon>
    perm_mediaperm media icon<mat-icon>perm_media</mat-icon>
    perm_phone_msgperm phone msg icon<mat-icon>perm_phone_msg</mat-icon>
    perm_scan_wifiperm scan wifi icon<mat-icon>perm_scan_wifi</mat-icon>
    petspets icon<mat-icon>pets</mat-icon>
    picture_in_picturepicture in picture icon<mat-icon>picture_in_picture</mat-icon>
    picture_in_picture_altpicture in picture alt icon<mat-icon>picture_in_picture_alt</mat-icon>
    play_for_workplay for work icon<mat-icon>play_for_work</mat-icon>
    polymerpolymer icon<mat-icon>polymer</mat-icon>
    power_settings_newpower settings new icon<mat-icon>power_settings_new</mat-icon>
    pregnant_womanpregnant woman icon<mat-icon>pregnant_woman</mat-icon>
    printprint icon<mat-icon>print</mat-icon>
    query_builderquery builder icon<mat-icon>query_builder</mat-icon>
    question_answerquestion answer icon<mat-icon>question_answer</mat-icon>
    receiptreceipt icon<mat-icon>receipt</mat-icon>
    record_voice_overrecord voice over icon<mat-icon>record_voice_over</mat-icon>
    redeemredeem icon<mat-icon>redeem</mat-icon>
    remove_shopping_cartremove shopping cart icon<mat-icon>remove_shopping_cart</mat-icon>
    reorderreorder icon<mat-icon>reorder</mat-icon>
    report_problemreport problem icon<mat-icon>report_problem</mat-icon>
    restorerestore icon<mat-icon>restore</mat-icon>
    restore_from_trashrestore from trash icon<mat-icon>restore_from_trash</mat-icon>
    restore_pagerestore page icon<mat-icon>restore_page</mat-icon>
    roomroom icon<mat-icon>room</mat-icon>
    rounded_cornerrounded corner icon<mat-icon>rounded_corner</mat-icon>
    rowingrowing icon<mat-icon>rowing</mat-icon>
    scheduleschedule icon<mat-icon>schedule</mat-icon>
    searchsearch icon<mat-icon>search</mat-icon>
    settingssettings icon<mat-icon>settings</mat-icon>
    settings_applicationssettings applications icon<mat-icon>settings_applications</mat-icon>
    settings_backup_restoresettings backup restore icon<mat-icon>settings_backup_restore</mat-icon>
    settings_bluetoothsettings bluetooth icon<mat-icon>settings_bluetooth</mat-icon>
    settings_brightnesssettings brightness icon<mat-icon>settings_brightness</mat-icon>
    settings_cellsettings cell icon<mat-icon>settings_cell</mat-icon>
    settings_ethernetsettings ethernet icon<mat-icon>settings_ethernet</mat-icon>
    settings_input_antennasettings input antenna icon<mat-icon>settings_input_antenna</mat-icon>
    settings_input_componentsettings input component icon<mat-icon>settings_input_component</mat-icon>
    settings_input_compositesettings input composite icon<mat-icon>settings_input_composite</mat-icon>
    settings_input_hdmisettings input hdmi icon<mat-icon>settings_input_hdmi</mat-icon>
    settings_input_svideosettings input svideo icon<mat-icon>settings_input_svideo</mat-icon>
    settings_overscansettings overscan icon<mat-icon>settings_overscan</mat-icon>
    settings_phonesettings phone icon<mat-icon>settings_phone</mat-icon>
    settings_powersettings power icon<mat-icon>settings_power</mat-icon>
    settings_remotesettings remote icon<mat-icon>settings_remote</mat-icon>
    settings_voicesettings voice icon<mat-icon>settings_voice</mat-icon>
    shopshop icon<mat-icon>shop</mat-icon>
    shop_twoshop two icon<mat-icon>shop_two</mat-icon>
    shopping_basketshopping basket icon<mat-icon>shopping_basket</mat-icon>
    shopping_cartshopping cart icon<mat-icon>shopping_cart</mat-icon>
    speaker_notesspeaker notes icon<mat-icon>speaker_notes</mat-icon>
    speaker_notes_offspeaker notes off icon<mat-icon>speaker_notes_off</mat-icon>
    spellcheckspellcheck icon<mat-icon>spellcheck</mat-icon>
    starsstars icon<mat-icon>stars</mat-icon>
    storestore icon<mat-icon>store</mat-icon>
    subjectsubject icon<mat-icon>subject</mat-icon>
    supervised_user_circlesupervised user circle icon<mat-icon>supervised_user_circle</mat-icon>
    supervisor_accountsupervisor account icon<mat-icon>supervisor_account</mat-icon>
    swap_horizswap horiz icon<mat-icon>swap_horiz</mat-icon>
    swap_horizontal_circleswap horizontal circle icon<mat-icon>swap_horizontal_circle</mat-icon>
    swap_vertswap vert icon<mat-icon>swap_vert</mat-icon>
    swap_vertical_circleswap vertical circle icon<mat-icon>swap_vertical_circle</mat-icon>
    tabtab icon<mat-icon>tab</mat-icon>
    tab_unselectedtab unselected icon<mat-icon>tab_unselected</mat-icon>
    text_rotate_uptext rotate up icon<mat-icon>text_rotate_up</mat-icon>
    text_rotate_verticaltext rotate vertical icon<mat-icon>text_rotate_vertical</mat-icon>
    text_rotation_downtext rotation down icon<mat-icon>text_rotation_down</mat-icon>
    text_rotation_nonetext rotation none icon<mat-icon>text_rotation_none</mat-icon>
    theaterstheaters icon<mat-icon>theaters</mat-icon>
    thumb_downthumb down icon<mat-icon>thumb_down</mat-icon>
    thumb_upthumb up icon<mat-icon>thumb_up</mat-icon>
    thumbs_up_downthumbs up down icon<mat-icon>thumbs_up_down</mat-icon>
    timelinetimeline icon<mat-icon>timeline</mat-icon>
    toctoc icon<mat-icon>toc</mat-icon>
    todaytoday icon<mat-icon>today</mat-icon>
    tolltoll icon<mat-icon>toll</mat-icon>
    touch_apptouch app icon<mat-icon>touch_app</mat-icon>
    track_changestrack changes icon<mat-icon>track_changes</mat-icon>
    translatetranslate icon<mat-icon>translate</mat-icon>
    trending_downtrending down icon<mat-icon>trending_down</mat-icon>
    trending_flattrending flat icon<mat-icon>trending_flat</mat-icon>
    trending_uptrending up icon<mat-icon>trending_up</mat-icon>
    turned_inturned in icon<mat-icon>turned_in</mat-icon>
    turned_in_notturned in not icon<mat-icon>turned_in_not</mat-icon>
    updateupdate icon<mat-icon>update</mat-icon>
    verified_userverified user icon<mat-icon>verified_user</mat-icon>
    vertical_splitvertical split icon<mat-icon>vertical_split</mat-icon>
    view_agendaview agenda icon<mat-icon>view_agenda</mat-icon>
    view_arrayview array icon<mat-icon>view_array</mat-icon>
    view_carouselview carousel icon<mat-icon>view_carousel</mat-icon>
    view_columnview column icon<mat-icon>view_column</mat-icon>
    view_dayview day icon<mat-icon>view_day</mat-icon>
    view_headlineview headline icon<mat-icon>view_headline</mat-icon>
    view_listview list icon<mat-icon>view_list</mat-icon>
    view_moduleview module icon<mat-icon>view_module</mat-icon>
    view_quiltview quilt icon<mat-icon>view_quilt</mat-icon>
    view_streamview stream icon<mat-icon>view_stream</mat-icon>
    view_weekview week icon<mat-icon>view_week</mat-icon>
    visibilityvisibility icon<mat-icon>visibility</mat-icon>
    visibility_offvisibility off icon<mat-icon>visibility_off</mat-icon>
    voice_over_offvoice over off icon<mat-icon>voice_over_off</mat-icon>
    watch_laterwatch later icon<mat-icon>watch_later</mat-icon>
    workwork icon<mat-icon>work</mat-icon>
    work_offwork off icon<mat-icon>work_off</mat-icon>
    work_outlinework outline icon<mat-icon>work_outline</mat-icon>
    youtube_searched_foryoutube searched for icon<mat-icon>youtube_searched_for</mat-icon>
    zoom_inzoom in icon<mat-icon>zoom_in</mat-icon>
    zoom_outzoom out icon<mat-icon>zoom_out</mat-icon>

    转载于:https://www.cnblogs.com/lishidefengchen/p/10691442.html

    展开全文
  • 搜索趋势工具(英文)

    2018-12-22 16:21:56
    However, with over 3.5 billion searches each day worldwide, it’s hard to know how to narrow all that data down to help you improve your SEO. Here are some of the many great tools available to help ...
  • 大家先看下目录 具体的内容请点击: ... ...下拉刷新模糊效果AutoLayout富文本图表表相关与Tabbar隐藏与显示HUD与Toast对话框其他UI ...网络连接图像获取网络聊天网络测试网页框架WebView与W
  • 保姆级教程,如何发现 GitHub 上的优质项目?

    千次阅读 多人点赞 2020-07-29 09:47:39
    把开源项目 down 到本地,然后看源码自己研究,顺带在原有的基础上补充一些功能,是不是就有项目经验了? 本身这些开源项目都是非常优质的,但功能并不会非常全面,毕竟作者的精力和时间有限。 虽然我不是 GitHub 上...
  • material-design-icons

    千次阅读 2018-08-16 15:46:54
    arrow_drop_down_circle e5c6   arrow_drop_up e5c7   arrow_forward e5c8   arrow_upward e5d8   art_track e060   aspect_ratio e85b   assessment ...
  • 如果After比Before的值大,则使用sp-field-trending–up样式,否则使用sp-field-trendingdown样式。 { " $schema ": "https://developer.microsoft.com/json-schemas/sp/column-formatting.schema.json" , ...
  • sp-field-trendingdown sp-field-quickAction 注意:上面显示的关于sp-field-severity类样式的图标并不是类的一部分,只是为了展示的样式看起来美观,样式只包含背景色。图标可以通过使用iconName...
  • 如果你不想顺着遍历,想反过来遍历,可以利用downTo (递减)关键字,从最大值到最小值递减! for(i in 9 downTo 5) print(arr[i]) println() 可能你还想隔着遍历,比如只遍历:10,7,4,1,可以用 step (步长)关键字...
  • peaks and/or is trending in the wrong direction: More Reading Wikipedia - A good, brief explanation of Load Average ; it goes a bit deeper into the mathematics Linux Journal - ...
  • Scrum Master Mock Test (3)

    千次阅读 2016-10-27 07:54:05
    Explanation: When the Scrum team has selected and committed to deliver a set of top priority features from the product backlog, the ScrumMaster leads the team in a planning session to break down ...
  • 现代编程语言的一个非常令人欣慰的事是有很多的社区在驱动语言的发展。 很多来自世界各地的程序员不求回报的写代码为别人造轮子、贡献代码、开发框架。开放源代码使得分散在世界各地的程序员们都能够贡献他们的...
  • COMP2741/8741

    2019-06-12 18:44:00
    descriptions we can detect “trending topics.” See below for more information on YouTube and the YouTube Data API that generate the list of videos used in this assignment: YouTube: ...
  • 最近picojs上了Github Trending,这是一个小巧的人脸检测库,200行JS,2K大小,性能很好,效果也还还行。于是我想有没其他的能在浏览器跑的人脸检测库,一查才发现OpenCV已经支持编译到WebAssembly,也就可以直接在...
  • both report the CPU load and alert you when the load peaks and/or is trending in the wrong direction: More Reading Wikipedia - A good, brief explanation of Load Average ; it goes a ...
  • Go 每日一库之 bubbletea

    2021-06-17 00:49:05
    为了让程序启动时,就去执行网络请求拉取 Trending 的列表,我们让模型的Init()方法返回一个tea.Cmd类型的值: func (m model) Init() tea.Cmd { return fetchTrending } func fetchTrending() tea.Msg { ...
  • below for options to speed up and slim down your network. Training on Your Own Categories If you've managed to get the script working on the flower example images, youcan start looking at ...
  • opencv.js 检测人脸

    千次阅读 2019-02-03 10:22:10
    最近picojs上了Github Trending,这是一个小巧的人脸检测库,200行JS,2K大小,性能很好,效果也还还行。于是我想有没其他的能在浏览器跑的人脸检测库,一查才发现OpenCV已经支持编译到WebAssembly,也就可以直接在...
  • t have to scroll down. It would still append it as new messages. <p>Example: ### New daily trending repos in Python #7 <p>Subscribe to this issue and stay notified about new <a href="#top">daily ...
  • This was identified by the team’s cycle time trending upwards. We asked some more Android developers to help with code reviews, cycle time was restored and our Android delivery cadence was healthy ...
  • :up-down_arrow_selector: 多种尺寸和选择样式 :scroll: 水平滚动无数段 :gear_selector: 带有字体,颜色,字距,阴影等文本属性的高级标题样式 :desktop_computer: 与Swift和Objective-C兼容 :mobile_phone: ...
  • This book funnels down the key information to provide the required expertise to the readers to enter the world of artificial intelligence, thus extending the knowledge of intermediate TensorFlow ...
  • Hexo-Matery主题细致美化

    万次阅读 多人点赞 2021-01-21 20:39:03
    Hexo-Matery主题美化 在一番瞎改js代码后,终于无法忍受next主题,于是愤然投入Matery大家庭,结果证明,香! 下面是我记录的配置Matery主题的流程,仅供后来的师傅们参考。 大家可以来我Hexo博客主页看看具体效果...
  • iOS GitHub上常用第三方框架

    千次阅读 2018-05-10 16:58:32
    目录UI下拉刷新模糊效果AutoLayout富文本图表表相关与Tabbar隐藏与显示HUD与Toast对话框其他UI动画侧滑与右滑返回手势gif动画其他动画网络相关网络连接图像获取网络聊天网络测试网页框架WebView与WKWebViewModel...
  • Chapter 1 Important Disclaimer The products referenced at this site are analytical tools only, and are not intended to replace individual research or licensed investment advice. Unique experienc...
  • iOS 强大第三方资源库

    千次阅读 2017-03-08 11:20:00
    这次主要增加了登录GitHub的功能,随手follow和star,并且增加发现模块,包括GitHub的trending,动态,showcases等。 Uther - 跟蠢萌的外星人聊天,还能帮你记事”。 itunes下载 。 高仿斗鱼TV - 高仿斗鱼TV,...

空空如也

空空如也

1 2 3 4 5 ... 20
收藏数 563
精华内容 225
关键字:

downtrending