Network Reliability Engineering Community

JET lesson timeout in local setup

I’ve set up the Antidote locally on MacBookPro running Mojave.
Followed this page for the setup https://antidoteproject.readthedocs.io/en/latest/building/local.html#buildlocal

This JET lesson fails for me with the following error message

An error occurred.

Timeout waiting for lesson to become ready.

Unfortunately this means we can’t show you this lesson right now. Please post the details of what lesson you were accessing, as well as the above error message as a Github issue. In the meantime, try refreshing this page, or searching for another lesson.

When I click on this JET lesson, initially I get this message
“Waiting for lesson endpoints to become reachable…(0/2)” It goes fine to next step
“Waiting for lesson endpoints to become reachable…(1/2)”, this is where it times out and throws above error.

I tried couple of other lessons, they work fine.
http://antidote-local:30001/labs/?lessonId=24&lessonStage=1 [Junos Automation with PyEz]
http://antidote-local:30001/labs/?lessonId=12&lessonStage=1 [Unit Testing Networks with JSNApy]

Some more info on the setup status

sbattu-mbp:antidote-selfmedicate sbattu$ kubectl get pods

NAME                                        READY   STATUS    RESTARTS   AGE
antidote-web-57f98b78d4-f5vg9               2/2     Running   10         22d
nginx-ingress-controller-6f575d4f84-nc96h   1/1     Running   6          22d
syringe-6ffd7b7ccc-x4ts9                    1/1     Running   5          22d

sbattu-mbp:antidote-selfmedicate sbattu$ minikube ssh docker image list

REPOSITORY                                          TAG                 IMAGE ID            CREATED             SIZE
antidotelabs/configurator                           latest              da8050551eff        2 weeks ago         1.13GB
weaveworks/weave-kube                               2.5.2               f04a043bb67a        3 weeks ago         148MB
weaveworks/weave-npc                                2.5.2               5ce48e0d813c        3 weeks ago         49.6MB
antidotelabs/antidote-web                           latest              eef58d80c892        3 weeks ago         495MB
antidotelabs/syringe                                latest              c67f8a2d456e        4 weeks ago         94.1MB
antidotelabs/vqfx-full                              18.1R1.9            4c98bcb1f462        7 weeks ago         1.3GB
antidotelabs/utility                                latest              88556564426b        2 months ago        784MB
k8s.gcr.io/kube-apiserver                           v1.13.3             fe242e556a99        4 months ago        181MB
k8s.gcr.io/kube-proxy                               v1.13.3             98db19758ad4        4 months ago        80.3MB
k8s.gcr.io/kube-controller-manager                  v1.13.3             0482f6400933        4 months ago        146MB
k8s.gcr.io/kube-scheduler                           v1.13.3             3a6f709e97a0        4 months ago        79.6MB
antidotelabs/githelper                              latest              1b37ec568c2d        4 months ago        23MB
antidotelabs/vqfx                                   snap3               4464bcbaddd0        4 months ago        1.36GB
antidotelabs/vqfx                                   snap2               cceda1869806        4 months ago        1.24GB
antidotelabs/vqfx                                   snap1               234df3e30587        4 months ago        1.23GB
guacamole/guacd                                     latest              57f6ce568e0d        5 months ago        395MB
k8s.gcr.io/coredns                                  1.2.6               f59dcacceff4        7 months ago        40MB
k8s.gcr.io/etcd                                     3.2.24              3cab8e1b9802        8 months ago        220MB
nfvpe/multus                                        v3.1                d9e7bffad290        9 months ago        477MB
k8s.gcr.io/kube-addon-manager                       v8.6                9c16409588eb        15 months ago       78.4MB
k8s.gcr.io/pause                                    3.1                 da86e6ba6ca1        17 months ago       742kB
gcr.io/k8s-minikube/storage-provisioner             v1.8.1              4689081edb10        19 months ago       80.8MB
gcr.io/google_containers/nginx-ingress-controller   0.9.0-beta.5        bd694199976d        2 years ago         121MB

I see this message in

sbattu-mbp:antidote-selfmedicate sbattu$ kubectl describe pods antidote-web-57f98b78d4-f5vg9

Events:
  Type     Reason     Age                From               Message
  ----     ------     ----               ----               -------
  Warning  Unhealthy  59m (x3 over 62m)  kubelet, minikube  Readiness probe failed: Get http://10.32.0.11:8080/: net/http: request canceled (Client.Timeout exceeded while awaiting headers)

sbattu-mbp:antidote-selfmedicate sbattu$ minikube version
minikube version: v0.34.1

Please let me know if any additional information will be helpful to debug further.

Please provide the Syringe logs.

Hi Matt,
Sorry got caught up in some priority work, didn’t get chance to replicate, added the logs output in a file here, https://gist.github.com/rambattu/3e14bd72866bc290d632874a973b7d41

Best,

  • Ram.

Hmmm…nothing looks wrong from Syringe, just seems like it’s waiting on the endpoint to come up. Note that this is the vqfx-full image, which will take quite a bit longer to start. How many times have you tried to start the lesson, and it’s this way each time? If you tried it once, you more than likely had to wait a while for the image to download first, before it even started the boot process.

FWIW while I’m certainly seeing long boot times as expected, it seems to be coming up for me. The vqfx logs might provide some clues - here are the last four lines in a normal boot for that image:

Starting cosim ...
Booting VCP ...
mkdir: cannot create directory '/root/pecosim': File exists
cosim logs will be created in /root/pecosim

At some point after that shows, the image should respond to health checks, but like I said, it will be a few minutes. Your next step should be to verify that you’re seeing this.

If you are, I would be interested to see if you could SSH to the vqfx from the other pod before Syringe thinks its online. Try this (substituting your own lesson namespace):

kubectl -n=25-0mmd27lzvquddz8k-ns exec -it linux /bin/bash
root@linux:/# ssh antidote@vqfx
Password: antidotepassword
Last login: Sun Jun 16 05:31:10 2019 from 10.32.27.54
--- JUNOS 18.1R1.9 built 2018-03-23 20:37:00 UTC
{master:0}
antidote@vqfx>

Try that and let me know what behavior you see.

Hi Matt,

I’ve tried few times, typically after 15 to 20 minutes I see this message “Timeout waiting for lesson to become ready.”. I haven’t exactly timed it, I can try if that helps. I’ve tried refresh multiple times as well, not much help. Checked Syringe logs, looks same as what I posted earlier.

How can I check vqfx logs? Is there a way to get to console to see where it is hanging?

Thank you,
-Ram.

kubectl logs -n=<namespace of lesson> vqfx -f will tail the logs of the vqfx pod. The output should be something similar to what I posted above.

In my last post I gave a step you could try while the lesson is loading. You should be able to get into the linux pod fairly quickly, and before the front-end times out, try to get into the vqfx pod from there. Can you give that a try?

Taking a step back, it could be possible that the hardware you’re using just isn’t fast enough to boot in the allotted time. The full vqfx image is pretty heavy-handed; it’s one reason we don’t use it a lot, and for the lessons like this one that require it, we’re moving NRE Labs to a baremetal provider. It’s kind of a pain but necessary. If I were you, if all the above fails, you should see what happens when you try to run it just using docker, so that none of Antidote’s timeouts apply.

Something like

docker run --privileged -p 2222:22 antidotelabs/vqfx-full:18.1R1.9

and see how long it takes for you to be able to ssh to antidote@127.0.0.1 -p 2222

Also, another thing you can do is use the QEMU seriakl port inside the container to view the actual VM logs. First, enter the bash prompt of the vqfx container:

kubectl -n=<namespace of lesson> exec -it vqfx /bin/bash

Then, telnet to the serial port:

telnet 127.0.0.1 5000

Hi Matt,

1)I tried to log in to Linux pod, the SSH timed out, tried multiple times

sbattu-mbp:antidote-selfmedicate sbattu$ kubectl -n=25-66n1ihh28m8yslto-ns exec -it linux /bin/bash
root@linux:/# ssh antidote@vqfx
ssh: connect to host vqfx port 22: Connection timed out

2)The vqfx logs showed no output at all

sbattu-mbp:antidote-selfmedicate sbattu$ kubectl logs -n=25-66n1ihh28m8yslto-ns vqfx -f
sbattu-mbp:antidote-selfmedicate sbattu$ kubectl logs -n=25-66n1ihh28m8yslto-ns vqfx -f
sbattu-mbp:antidote-selfmedicate sbattu$ kubectl logs -n=25-66n1ihh28m8yslto-ns vqfx -f

3)Attempt to get serial port logs failed with this error.

sbattu-mbp:antidote-selfmedicate sbattu$ kubectl -n=25-66n1ihh28m8yslto-ns exec -it vqfx /bin/bash
error: unable to upgrade connection: container not found ("vqfx")

4)Local docker attempt, currently facing this, have to look at it further.

sbattu-mbp:~ sbattu$ docker run --privileged -p 2222:22 antidotelabs/vqfx-full:18.1R1.9
Juniper Networks vQFX Docker Light Container
/u contains the following files:
jinstall-vqfx-10-f-18.1R1.9.img
junos-openconfig-0.0.0.10-1-signed.tgz
using qcow2 image jinstall-vqfx-10-f-18.1R1.9.img
8: eth0@if9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default
    link/ether 02:42:ac:11:00:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0
no access to /var/run/docker.sock
Interface  IPv6 address
Bridging  with em0
Current MAC:   02:42:ac:11:00:02 (unknown)
Permanent MAC: 00:00:00:00:00:00 (XEROX CORPORATION)
New MAC:       00:d0:86:d6:0a:ec (FOVEON, INC.)
ls: cannot access 'id_*.pub': No such file or directory
WARNING: Can't read ssh public key file . Creating user 'antidote' with same password as root
default route:
mygw=
-----------------------------------------------------------------------
fcc25981d7eb (172.17.0.2) 18.1R1.9 root password antidotepassword
-----------------------------------------------------------------------

Creating empty /tmp/vqfxhdd.img for VCP ...
checking for eth/net ...
done
walking the network list
=========
NETDEVS=
highest mtu 1500
=========
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT group default qlen 1
    link/ipip 0.0.0.0 brd 0.0.0.0
3: ip6tnl0@NONE: <NOARP> mtu 1452 qdisc noop state DOWN mode DEFAULT group default qlen 1
    link/tunnel6 :: brd ::
4: em0@eth0: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 500
    link/ether 02:42:ac:11:00:02 brd ff:ff:ff:ff:ff:ff
5: em1: <BROADCAST,MULTICAST,PROMISC,UP> mtu 9500 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether a2:f1:36:6f:e9:2c brd ff:ff:ff:ff:ff:ff
6: em2: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether 72:e1:14:d6:31:2d brd ff:ff:ff:ff:ff:ff
8: eth0@if9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default
    link/ether 02:42:ac:11:00:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0
Creating config drive
METADISK=/tmp/configdrive.qcow2 CONFIG=/tmp/config.txt
Creating config drive (configdrive.img) ...
adding /u/junos-openconfig-0.0.0.10-1-signed.tgz
adding config file /tmp/config.txt
50+0 records in
50+0 records out
52428800 bytes (52 MB, 50 MiB) copied, 0.166627 s, 315 MB/s
mkfs.fat 4.1 (2017-01-24)
-rw-r--r-- 1 root root 3604480 Jun 18 20:58 /tmp/configdrive.qcow2
Starting cosim ...
Booting VCP ...
mkdir: cannot create directory '/root/pecosim': File exists
cosim logs will be created in /root/pecosim
Could not access KVM kernel module: No such file or directory
qemu-system-x86_64: failed to initialize KVM: No such file or directory

You will likely need to run the docker command in a linux VM or something.

regarding the kube pods, can you give output of kubectl describe?

Here is the kubectl describe pods output Matt.

Will try the docker command on a Linux VM, I did this evening on my MBP.

Sorry, I was being overly terse. I need the description of the lesson pods, specifically the vqfx. So you’ll need to provide the namespace flag, with the namespace of the lesson (you can see these using kubectl get ns)

No problem, thank you for the detailed steps and patient follow up on the thread Matt.
Here’ s a new gist that I got through command “kubectl describe pods --namespace 25-66n1ihh28m8yslto-ns”

https://gist.github.com/rambattu/f9f8fe65efe008e8a0da41c14fd34cef

These lines seems to highlight some warnings for vqfx

Warning Failed 15m kubelet, minikube Error: failed to start container “vqfx”: Error response from daemon: OCI runtime create failed: container_linux.go:344: starting container process caused “process_linux.go:424: container init caused “rootfs_linux.go:58: mounting \”/var/lib/kubelet/pods/d5851e98-931d-11e9-a7e5-08002722a72c/volume-subpaths/git-volume/vqfx/0\” to rootfs \"/var/lib/docker/overlay2/98f0ba05dff33fa93c7dfe0c7c695ff4f731614385c2331f779ad26d114df84b/merged\" at \"/var/lib/docker/overlay2/98f0ba05dff33fa93c7dfe0c7c695ff4f731614385c2331f779ad26d114df84b/merged/antidote\" caused \“no such file or directory\”"": unknown
Warning Failed 15m kubelet, minikube Error: failed to start container “vqfx”: Error response from daemon: OCI runtime create failed: container_linux.go:344: starting container process caused “process_linux.go:424: container init caused “rootfs_linux.go:58: mounting \”/var/lib/kubelet/pods/d5851e98-931d-11e9-a7e5-08002722a72c/volume-subpaths/git-volume/vqfx/0\” to rootfs \"/var/lib/docker/overlay2/3b4ba68d611ce1a3dad2d30b02a6bded563dbe5ec708fe1ca166f1259ca5dc83/merged\" at \"/var/lib/docker/overlay2/3b4ba68d611ce1a3dad2d30b02a6bded563dbe5ec708fe1ca166f1259ca5dc83/merged/antidote\" caused \“no such file or directory\”"": unknown