Skip to content
This repository was archived by the owner on Feb 22, 2020. It is now read-only.

Commit 5fdf67a

Browse files
authored
Merge pull request #341 from gnes-ai/doc-add-flow
docs(flow): add flow to readme as main api
2 parents 394a007 + ec03351 commit 5fdf67a

File tree

8 files changed

+2479
-330
lines changed

8 files changed

+2479
-330
lines changed

.github/0a3f26d8.png

618 KB
Loading

.github/mermaid-diagram-20191017172946.svg

Lines changed: 1328 additions & 0 deletions
Loading

.github/mermaid-diagram-20191017173106.svg

Lines changed: 665 additions & 0 deletions
Loading

CHANGELOG.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,12 @@
11

2+
# 💥 Breaking Changes (`v0.0.45 -> v0.0.46`)
3+
4+
The new GNES Flow API introduced since `v0.0.46` has become the main API of GNES. It provides a pythonic and intuitive way of building pipelines in GNES, enabling run/debug on a local machine. It also supports graph visualization, swarm/k8s config export, etc. More information about [GNES Flow can be found at here](http://doc.gnes.ai/en/latest/api/gnes.flow.html).
5+
6+
As a consequence, the [`composer` module](/gnes/composer) as well as `gnes compose` CLI and GNES board web UI will be removed in the next releases.
7+
8+
GNES board will be redesigned using the GNES Flow API. We highly [welcome your contribution on this thread](CONTRIBUTING.md)!
9+
210
# Release Note (`v0.0.45`)
311
> Release time: 2019-10-15 14:01:07
412

README.md

Lines changed: 139 additions & 328 deletions
Large diffs are not rendered by default.

docs/chapter/swarm-tutorial.md

Lines changed: 333 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,333 @@
1+
# Using GNES with Docker Swarm
2+
3+
### Build your first GNES app on local machine
4+
5+
Let's start with a typical indexing procedure by writing a YAML config (see the left column of the table):
6+
7+
<table>
8+
<tr>
9+
<th>YAML config</th><th>GNES workflow (generated by <a href="https://board.gnes.ai">GNES board</a>)</th>
10+
</tr>
11+
<tr>
12+
<td width="30%">
13+
<pre lang="yaml">
14+
port: 5566
15+
services:
16+
- name: Preprocessor
17+
yaml_path: text-prep.yml
18+
- name: Encoder
19+
yaml_path: gpt2.yml
20+
- name: Indexer
21+
yaml_path: b-indexer.yml
22+
</pre>
23+
</td>
24+
<td width="70%">
25+
<img src=".github/mermaid-diagram-20190723165430.svg" alt="GNES workflow of example 1">
26+
</td>
27+
</tr>
28+
</table>
29+
30+
Now let's see what the YAML config says. First impression, it is pretty intuitive. It defines a pipeline workflow consists of preprocessing, encoding and indexing, where the output of the former component is the input of the next. This pipeline is a typical workflow of *index* or *query* runtime. Under each component, we also associate it with a YAML config specifying how it should work. Right now they are not important for understanding the big picture, nonetheless curious readers can checkout how each YAML looks like by expanding the items below.
31+
32+
<details>
33+
<summary>Preprocessor config: text-prep.yml (click to expand...)</summary>
34+
35+
```yaml
36+
!SentSplitPreprocessor
37+
parameters:
38+
start_doc_id: 0
39+
random_doc_id: True
40+
deliminator: "[.!?]+"
41+
gnes_config:
42+
is_trained: true
43+
```
44+
</details>
45+
46+
<details>
47+
<summary>Encoder config: gpt2.yml (click to expand...)</summary>
48+
49+
```yaml
50+
!PipelineEncoder
51+
components:
52+
- !GPT2Encoder
53+
parameters:
54+
model_dir: $GPT2_CI_MODEL
55+
pooling_stragy: REDUCE_MEAN
56+
gnes_config:
57+
is_trained: true
58+
- !PCALocalEncoder
59+
parameters:
60+
output_dim: 32
61+
num_locals: 8
62+
gnes_config:
63+
batch_size: 2048
64+
- !PQEncoder
65+
parameters:
66+
cluster_per_byte: 8
67+
num_bytes: 8
68+
gnes_config:
69+
work_dir: ./
70+
name: gpt2bin-pipe
71+
```
72+
73+
</details>
74+
75+
<details>
76+
<summary>Indexer config: b-indexer.yml (click to expand...)</summary>
77+
78+
```yaml
79+
!BIndexer
80+
parameters:
81+
num_bytes: 8
82+
data_path: /out_data/idx.binary
83+
gnes_config:
84+
work_dir: ./
85+
name: bindexer
86+
```
87+
</details>
88+
89+
On the right side of the above table, you can see how the actual data flow looks like. There is an additional component `gRPCFrontend` automatically added to the workflow, it allows you to feed the data and fetch the result via gRPC protocol through port `5566`.
90+
91+
Now it's time to run! [GNES board](https://board.gnes.ai) can automatically generate a starting script/config based on the YAML config you give, saving troubles of writing them on your own.
92+
93+
<p align="center">
94+
<a href="https://gnes.ai">
95+
<img src=".github/gnes-board-demo.gif?raw=true" alt="GNES Board">
96+
</a>
97+
</p>
98+
99+
> 💡 You can also start a GNES board locally. Simply run `docker run -d -p 0.0.0.0:80:8080/tcp gnes/gnes compose --serve`
100+
101+
As a cloud-native application, GNES requires an **orchestration engine** to coordinate all micro-services. We support Kubernetes, Docker Swarm and shell-based multi-process. Let's see what the generated script looks like in this case.
102+
103+
<details>
104+
<summary>Shell-based starting script (click to expand...)</summary>
105+
106+
```bash
107+
#!/usr/bin/env bash
108+
set -e
109+
110+
trap 'kill $(jobs -p)' EXIT
111+
112+
printf "starting service gRPCFrontend with 0 replicas...\n"
113+
gnes frontend --grpc_port 5566 --port_out 49668 --socket_out PUSH_BIND --port_in 60654 --socket_in PULL_CONNECT &
114+
printf "starting service Preprocessor with 0 replicas...\n"
115+
gnes preprocess --yaml_path text-prep.yml --port_in 49668 --socket_in PULL_CONNECT --port_out 61911 --socket_out PUSH_BIND &
116+
printf "starting service Encoder with 0 replicas...\n"
117+
gnes encode --yaml_path gpt2.yml --port_in 61911 --socket_in PULL_CONNECT --port_out 49947 --socket_out PUSH_BIND &
118+
printf "starting service Indexer with 0 replicas...\n"
119+
gnes index --yaml_path b-indexer.yml --port_in 49947 --socket_in PULL_CONNECT --port_out 60654 --socket_out PUSH_BIND &
120+
121+
wait
122+
```
123+
</details>
124+
125+
<details>
126+
<summary>DockerSwarm compose file (click to expand...)</summary>
127+
128+
```yaml
129+
version: '3.4'
130+
services:
131+
gRPCFrontend00:
132+
image: gnes/gnes-full:latest
133+
command: frontend --grpc_port 5566 --port_out 49668 --socket_out PUSH_BIND --port_in
134+
60654 --socket_in PULL_CONNECT --host_in Indexer30
135+
ports:
136+
- 5566:5566
137+
Preprocessor10:
138+
image: gnes/gnes-full:latest
139+
command: preprocess --port_in 49668 --socket_in PULL_CONNECT
140+
--port_out 61911 --socket_out PUSH_BIND --yaml_path /Preprocessor10_yaml --host_in
141+
gRPCFrontend00
142+
configs:
143+
- Preprocessor10_yaml
144+
Encoder20:
145+
image: gnes/gnes-full:latest
146+
command: encode --port_in 61911 --socket_in PULL_CONNECT
147+
--port_out 49947 --socket_out PUSH_BIND --yaml_path /Encoder20_yaml --host_in
148+
Preprocessor10
149+
configs:
150+
- Encoder20_yaml
151+
Indexer30:
152+
image: gnes/gnes-full:latest
153+
command: index --port_in 49947 --socket_in PULL_CONNECT
154+
--port_out 60654 --socket_out PUSH_BIND --yaml_path /Indexer30_yaml --host_in
155+
Encoder20
156+
configs:
157+
- Indexer30_yaml
158+
volumes: {}
159+
networks:
160+
gnes-net:
161+
driver: overlay
162+
attachable: true
163+
configs:
164+
Preprocessor10_yaml:
165+
file: text-prep.yml
166+
Encoder20_yaml:
167+
file: gpt2.yml
168+
Indexer30_yaml:
169+
file: b-indexer.yml
170+
```
171+
</details>
172+
173+
174+
For the sake of simplicity, we will just use the generated shell-script to start GNES. Create a new file say `run.sh`, copy the content to it and run it via `$ bash ./run.sh`. You should see the output as follows:
175+
176+
<p align="center">
177+
<a href="https://gnes.ai">
178+
<img src=".github/shell-success.svg" alt="success running GNES in shell">
179+
</a>
180+
</p>
181+
182+
This suggests the GNES app is ready and waiting for the incoming data. You may now feed data to it through the `gRPCFrontend`. Depending on your language (Python, C, Java, Go, HTTP, Shell, etc.) and the content form (image, video, text, etc), the data feeding part can be slightly different.
183+
184+
To stop a running GNES, you can simply do <kbd>control</kbd> + <kbd>c</kbd>.
185+
186+
187+
### Scale your GNES app to the cloud
188+
189+
Now let's juice it up a bit. To be honest, building a single-machine process-based pipeline is not impressive anyway. The true power of GNES is that you can scale any component at any time you want. Encoding is slow? Adding more machines. Preprocessing takes too long? More machines. Index file is too large? Adding shards, aka. more machines!
190+
191+
In this example, we compose a more complicated GNES workflow for images. This workflow consists of multiple preprocessors, encoders and two types of indexers. In particular, we introduce two types of indexers: one for storing the encoded binary vectors, the other for storing the original images, i.e. full-text index. These two types of indexers work in parallel. Check out the YAML file on the left side of table for more details, note how `replicas` is defined for each component.
192+
193+
<table>
194+
<tr>
195+
<th>YAML config</th><th>GNES workflow (generated by <a href="https://board.gnes.ai">GNES board</a>)</th>
196+
</tr>
197+
<tr>
198+
<td width="30%">
199+
<pre lang="yaml">
200+
port: 5566
201+
services:
202+
- name: Preprocessor
203+
replicas: 2
204+
yaml_path: image-prep.yml
205+
- name: Encoder
206+
replicas: 3
207+
yaml_path: incep-v3.yml
208+
- - name: Indexer
209+
yaml_path: faiss.yml
210+
replicas: 4
211+
- name: Indexer
212+
yaml_path: fulltext.yml
213+
replicas: 3
214+
</pre>
215+
</td>
216+
<td width="70%">
217+
<a href="https://gnes.ai">
218+
<img src=".github/mermaid-diagram-20190723191407.svg" alt="GNES workflow of example 2">
219+
</a>
220+
</td>
221+
</tr>
222+
</table>
223+
224+
You may realize that besides the `gRPCFrontend`, multiple `Router` have been added to the workflow. Routers serve as a message broker between microservices, determining how and where the message is received and sent. In the last pipeline example, the data flow is too simple so there is no need for adding any router. In this example routers are necessary for connecting multiple preprocessors and encoders, otherwise preprocessors wouldn't know where to send the message. GNES Board automatically adds router to the workflow when necessary based on the type of two consecutive layers. It may also add stacked routers, as you can see between encoder and indexer in the right graph.
225+
226+
Again, the detailed YAML config of each component is not important for understanding the big picture, hence we omit it for now.
227+
228+
This time we will run GNES via DockerSwarm. To do that simply copy the generated DockerSwarm YAML config to a file say `my-gnes.yml`, and then do
229+
```bash
230+
docker stack deploy --compose-file my-gnes.yml gnes-531
231+
```
232+
233+
Note that `gnes-531` is your GNES stack name, keep that name in mind. If you forget about that name, you can always use `docker stack ls` to find out. To tell whether the whole stack is running successfully or not, you can use `docker service ls -f name=gnes-531`. The number of replicas `1/1` or `4/4` suggests everything is fine.
234+
235+
Generally, a complete and successful Docker Swarm starting process should look like the following:
236+
237+
<p align="center">
238+
<a href="https://gnes.ai">
239+
<img src=".github/swarm-success.svg" alt="success running GNES in shell">
240+
</a>
241+
</p>
242+
243+
244+
When the GNES stack is ready and waiting for the incoming data, you may now feed data to it through the `gRPCFrontend`. Depending on your language (Python, C, Java, Go, HTTP, Shell, etc.) and the content form (image, video, text, etc), the data feeding part can be slightly different.
245+
246+
247+
To stop a running GNES stack, you can use `docker stack rm gnes-531`.
248+
249+
250+
### Customize GNES to your need
251+
252+
With the help of GNES Board, you can easily compose a GNES app for different purposes. The table below summarizes some common compositions with the corresponding workflow visualizations. Note, we hide the component-wise YAML config (i.e. `yaml_path`) for the sake of clarity.
253+
254+
<table>
255+
<tr>
256+
<th>YAML config</th><th>GNES workflow (generated by <a href="https://board.gnes.ai">GNES board</a>)</th>
257+
</tr>
258+
<tr>
259+
<td width="30%">
260+
Parallel preprocessing only
261+
<pre lang="yaml">
262+
port: 5566
263+
services:
264+
- name: Preprocessor
265+
replicas: 2
266+
</pre>
267+
</td>
268+
<td width="70%">
269+
<a href="https://gnes.ai">
270+
<img src=".github/mermaid-diagram-20190724110437.svg" alt="GNES workflow of example 3" width="50%">
271+
</a>
272+
</td>
273+
</tr>
274+
<tr>
275+
<td width="30%">
276+
Training an encoder
277+
<pre lang="yaml">
278+
port: 5566
279+
services:
280+
- name: Preprocessor
281+
replicas: 3
282+
- name: Encoder
283+
</pre>
284+
</td>
285+
<td width="70%">
286+
<a href="https://gnes.ai">
287+
<img src=".github/mermaid-diagram-20190724111007.svg" alt="GNES workflow of example 4" width="70%">
288+
</a>
289+
</td>
290+
</tr>
291+
<tr>
292+
<td width="30%">
293+
Index-time with 3 vector-index shards
294+
<pre lang="yaml">
295+
port: 5566
296+
services:
297+
- name: Preprocessor
298+
- name: Encoder
299+
- name: Indexer
300+
replicas: 3
301+
</pre>
302+
</td>
303+
<td width="70%">
304+
<a href="https://gnes.ai">
305+
<img src=".github/mermaid-diagram-20190724111344.svg" alt="GNES workflow of example 5" width="90%">
306+
</a>
307+
</td>
308+
</tr>
309+
<tr>
310+
<td width="30%">
311+
Query-time with 2 vector-index shards followed by 3 full-text-index shards
312+
<pre lang="yaml">
313+
port: 5566
314+
services:
315+
- name: Preprocessor
316+
- name: Encoder
317+
- name: Indexer
318+
income: sub
319+
replicas: 2
320+
- name: Indexer
321+
income: sub
322+
replicas: 3
323+
</pre>
324+
</td>
325+
<td width="70%">
326+
<a href="https://gnes.ai">
327+
<img src=".github/mermaid-diagram-20190724112032.svg" alt="GNES workflow of example 5">
328+
</a>
329+
</td>
330+
</tr>
331+
</table>
332+
333+

docs/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -69,6 +69,7 @@ Highlights
6969
chapter/troubleshooting.md
7070
chapter/protobuf-dev.md
7171
chapter/enviromentvars.md
72+
chapter/swarm-tutorial.md
7273

7374
.. toctree::
7475
:maxdepth: 2

release.sh

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
# Requirements
44
# brew install hub
55
# npm install -g git-release-notes
6+
# pip install twine
67

78
set -e
89

@@ -95,6 +96,10 @@ if [[ -z "${BOT_URL}" ]]; then
9596
exit 1;
9697
fi
9798

99+
if [[ -z "${GITHUB_TOKEN}" ]]; then
100+
printf "GITHUB_TOKEN is not set! Need to export GITHUB_TOKEN=xxx"
101+
exit 1;
102+
fi
98103

99104
#$(grep "$VER_TAG" $CLIENT_CODE | sed -n 's/^.*'\''\([^'\'']*\)'\''.*$/\1/p')
100105
OLDVER=$(git tag -l | sort -V |tail -n1)
@@ -116,8 +121,6 @@ then
116121
change_line "$VER_TAG" "$VER_VAL" $INIT_FILE
117122
pub_pypi
118123
pub_gittag
119-
# change the version line back
120-
# mv ${TMP_INIT_FILE} $INIT_FILE
121124
make_chore_pr $VER
122125
fi
123126

0 commit comments

Comments
 (0)