| Table of Contents [Hide] |
Presentation
Overview
Meadre’s success relies on its ability to easily assemble data-intensive flows. Flows may be easily assembled by using the Meandre Workbench (MW), a icon based programming environment for assembling data-intensive flows. ZigZag targets rapid application development by speeding up the flow construction cycle.
ZigZag can accelerate data-intensive flow development cycle. It allows you to easily a describe data-intensive flow using the ZigZag language, which can then be compiled into a self-contained flow task for later execution.
ZigZag is a simple language for describing data-intensive flows; it is modeled after Python’s simplicity. ZigZag is declarative language for expressing the directed graphs (DG) that describe flows. A compiler is provided to transform a ZigZag program (.zz) into a Meandre self-contained task—or Meandre archive unit (.mau). Mau(s) can then be executed by a Meandre engine. Command-line tools allow ZigZag files to compile and execute.
The language provides four basic constructs:
- Component discovering and aliasing (CDA): Retrieve components from a repository location and create an alias for them.
- Component instantiation (CI): Instantiate a component that will be part of the data-intensive flow.
- Instance modification (CM): Change the behavior of a instance based on its properties.
- Instance invocation (II): Describe the data-intensive component relations with other components in the same flow.
Example
Flow diagram
The flow below pushes a sequence of ten strings that get converted to uppercase and finally printed to the console.
ZigZag code
The code below shows the ZigZag code that represents the data-intensive flow presented above.
# # This flow creates a flow converts to uppercase a sequence of # strings and then prints it to the console # # @author Xavier Llorà # @date March 7, 2008 # # @file: example.zz # # # Imports the three required components and creates the component aliases #
import <http://demo.seasr.org:1714/public/services/demo_repository.ttl> alias <meandre://test.org/component/push-string> as PUSH alias <meandre://test.org/component/to-uppercase> as TOUPPER alias <meandre://test.org/component/print-object> as PRINT
# # Creates four instances for the flow # push_hello, to_upper, print = PUSH(), TOUPPER(), PRINT() # # Sets up the properties of the instances # push_hello.message, push_hello.times = "Hello World!!!", "10" # # Describes the data-intensive flow # @hello = push_hello() @upper = to_upper(string:hello.string) print(object:upper.string) # # #
The code above can be compiled using zzc.jar and run using zzre.jar, or interactively from the zz console.
Automatic Parallelization
Before digging into the details, imagine the following situation. You have a flow that does a good job, but at a certain point when you keep pushing more and more data through it you realize that you could use multiple instances of the same component in parallel to boost the flow performance. This will also help max out all those cores you have sitting idle. Wouldn’t it be great if you could just say, for this component instance give me 4 copies that process data in parallel? Wouldn’t it be also greart if you didn’t need to worry about connecting anything?
Let’s assume that in the previous flow example, the conversion to upper case take a really long time governing the overall execution. That would be a perfect example to illustrate the parallelization capabilities that ZigZag provides
Unordered parallelization
Imagine now that you want a parallelized version of the component instance in the middle (the one that does most of the job). We can modify our ZigZag’ code to force the parallelization of that component instance. This modification will look like
<pre># # This flow creates a flow converts to uppercase a sequence of # strings and then prints it to the console # # @author Xavier Llorà # @date March 7, 2008 # # @file: example.zz # # # Imports the three required components and creates the component aliases #
import <http://demo.seasr.org:1714/public/services/demo_repository.ttl> alias <meandre://test.org/component/push-string> as PUSH alias <meandre://test.org/component/to-uppercase> as TOUPPER alias <meandre://test.org/component/print-object> as PRINT
# # Creates four instances for the flow # push_hello, to_upper, print = PUSH(), TOUPPER(), PRINT() # # Sets up the properties of the instances # push_hello.message, push_hello.times = "Hello World!!!", "10" # # Describes the data-intensive flow # @hello = push_hello() @upper = to_upper(string:hello.string)[+AUTO] print(object:upper.string) # # #
The [+AUTO]'' tells the ZigZag' compiler to parallelize the to_upper'' instance based of the underlying architecture. You can also specify the number of parallel instance you want, for instance [+4] will create 4 parallel instance. The resulting flow generated by the compiler looks as follows:
Notice that ZigZag has created 4 parallel instances of the component. It has also introduced a mapper instance that is in charge of distributing the incoming data to each of the parallel instance. Each of the parallel instances then push the data straight to the print instance. That's it. The ZigZag compiler has parallelized and connected a new flow, with almost no effort for you. This is called unordered parallelization, since data may be arriving to the print flow out of the original order in which they were generated by the push component instance.
Ordered parallelization
Sometimes applications need to maintain the order of the data being pushed through the flow. ZigZag can also parallelize instances that preserve the order of the data going through the flow (at the cost of a little overhead). The same example presented above can be turned into an order-preserving one as follows
# # This flow creates a flow converts to uppercase a sequence of # strings and then prints it to the console # # @author Xavier Llorà # @date March 7, 2008 # # @file: example.zz # # # Imports the three required components and creates the component aliases #
import <http://demo.seasr.org:1714/public/services/demo_repository.ttl> alias <meandre://test.org/component/push-string> as PUSH alias <meandre://test.org/component/to-uppercase> as TOUPPER alias <meandre://test.org/component/print-object> as PRINT
# # Creates four instances for the flow # push_hello, to_upper, print = PUSH(), TOUPPER(), PRINT() # # Sets up the properties of the instances # push_hello.message, push_hello.times = "Hello World!!!", "10" # # Describes the data-intensive flow # @hello = push_hello() @upper = to_upper(string:hello.string)[+AUTO!] print(object:upper.string) # # #
The ! after AUTO (of the number of parallel instances you want) tells the compiler to generate a parallelized flow that maintains the data order. The picture below shows the resulting flow that guaranties the order.
ZigZag introduces a reducer after the parallel instance to guarantee the order.






September 24th, 2009 at 2:40 pm
[...] Finally the serialized text is printed to the console. The equivalent code could be express as a ZigZag script [...]