Skip to content
This repository has been archived by the owner on Mar 23, 2019. It is now read-only.

since approx 2018-06-05, in-docker-container ansible-container build fails with "ansible.errors.AnsibleError: the role '<rolename>' was not found in <rolespath>" on different roles depending on environment #942

Open
dchsueh opened this issue Jun 7, 2018 · 5 comments
Labels

Comments

@dchsueh
Copy link

dchsueh commented Jun 7, 2018

ISSUE TYPE
  • Bug Report
container.yml

This is a reasonably small example I created to demonstrate the problem. (Yes it fails.)

version: '2'

settings:
  project_name: buildbox
  conductor:
    base: 'centos:7'

services:
  base:
    from: centos:7
    roles:
      - BuildBox/Base
      - BuildBox/Configuration1
      - BuildBox/Configuration2
      - BuildBox/Configuration3
      - BuildBox/Configuration4
    working_dir: /tmp
    ports:
      - '22'
    command:
      - /usr/sbin/sshd
      - -D

Individual roles have a tasks/main.yml of the form

---
- command: echo BASE

substitute BASE for ONE, TWO, THREE, FOUR to match role

OS / ENVIRONMENT

The environment for a virtualenv ansible-container install direct on ubuntu xenial:

Ansible Container, version 0.9.2
Linux, dhsueh-ubuntu, 4.13.0-43-generic, #48~16.04.1-Ubuntu SMP Thu May 17 12:56:46 UTC 2018, x86_64
2.7.12 (default, Dec  4 2017, 14:50:18) 
[GCC 5.4.0 20160609] <virtualenv directory path>/bin/python2

Believed-identical environment configured as a Dockerfile-built docker container "FROM ubuntu:xenial":

Ansible Container, version 0.9.2
Linux, b92df59f4255, 4.13.0-43-generic, #48~16.04.1-Ubuntu SMP Thu May 17 12:56:46 UTC 2018, x86_64
2.7.12 (default, Dec  4 2017, 14:50:18) 
[GCC 5.4.0 20160609] /usr/bin/python

(I have tried a "FROM centos:7" version as well - no difference.)

My environments are set up pinned to 0.9.2 with various workarounds applied as I encountered the need for them (ubuntu paths below):

pip --disable-pip-version-check install pip==9.0.3
pip --disable-pip-version-check install setuptools==39.2.0
pip --disable-pip-version-check install docker==2.7.0
pip --disable-pip-version-check install ansible-container[docker]==0.9.2
sed -i "s/filters={'name': self.secrets_volume_name}//g" /usr/local/lib/python2.7/dist-packages/container/docker/secrets.py
sed -i "s/return os.path.join(os.sep, 'run', 'secrets')/return os.path.join(os.sep, 'docker', 'secrets')/g" /usr/local/lib/python2.7/dist-packages/container/docker/engine.py

pip docker==2.7.0 is workaround that I can't find a reference for now (?!?!)
sed filters workaround addresses ansible-container bug described in moby/moby#34121
sed return is workaround for #762

SUMMARY

Heads up: The observed behavior is strikingly similar to #673 but does not involve any cloud-enabled roles; all roles requested confirmed to exist on the filesystem in the single path specified in --roles-path option.

I have many services, each with many different roles listed. Previous to 2018-06-05 everything was working fine on a particular docker host. On 2018-06-05 I added an extra role to my services. at the end of the list (e.g. "BuildBox/Configuration4") which resulted in different failures depending on the environment.

In a direct-on-iron ansible-container virtualenv environment created after the problem date, an "ansible-container build" call completes fine.

Depending on the docker host I run an ansible-container docker image on, I get an error like:

2018-06-07T18:00:35.723801 Processing defaults section... [container.config] caller_file=/_ansible/container/config.py caller_func=_process_defaults caller_line=325
2018-06-07T18:00:35.726157 Processing section...          [container.config] caller_file=/_ansible/container/config.py caller_func=_process_top_level_sections caller_line=334 section=volumes
2018-06-07T18:00:35.728781 Processing section...          [container.config] caller_file=/_ansible/container/config.py caller_func=_process_top_level_sections caller_line=334 section=registries
2018-06-07T18:00:35.731282 Processing section...          [container.config] caller_file=/_ansible/container/config.py caller_func=_process_top_level_sections caller_line=334 section=secrets
2018-06-07T18:00:35.733772 Processing service...          [container.config] caller_file=/_ansible/container/config.py caller_func=_process_services caller_line=340 service=u'base' service_data={u'command': [u'/usr/sbin/sshd', u'-D'], u'working_dir': u'/tmp', u'from': u'centos:7', u'ports': [u'22'], u'roles': [u'BuildBox/Base', u'BuildBox/Configuration1', u'BuildBox/Configuration2', u'BuildBox/Configuration3', u'BuildBox/Configuration4']}
Traceback (most recent call last):
  File "/usr/bin/conductor", line 11, in <module>
    load_entry_point('ansible-container', 'console_scripts', 'conductor')()
  File "/_ansible/container/__init__.py", line 19, in __wrapped__
    return fn(*args, **kwargs)
  File "/_ansible/container/cli.py", line 389, in conductor_commandline
    conductor_config = AnsibleContainerConductorConfig(list_to_ordereddict(containers_config))
  File "/_ansible/container/__init__.py", line 19, in __wrapped__
    return fn(*args, **kwargs)
  File "/_ansible/container/config.py", line 297, in __init__
    self._process_services()
  File "/_ansible/container/config.py", line 357, in _process_services
    role_metadata = get_metadata_from_role(role_name)
  File "/_ansible/container/__init__.py", line 19, in __wrapped__
    return fn(*args, **kwargs)
  File "/_ansible/container/utils/__init__.py", line 275, in get_metadata_from_role
    return get_content_from_role(role_name, os.path.join('meta', 'container.yml'))
  File "/_ansible/container/__init__.py", line 19, in __wrapped__
    return fn(*args, **kwargs)
  File "/_ansible/container/utils/__init__.py", line 264, in get_content_from_role
    role_path = resolve_role_to_path(role_name)
  File "/_ansible/container/__init__.py", line 19, in __wrapped__
    return fn(*args, **kwargs)
  File "/_ansible/container/utils/__init__.py", line 210, in resolve_role_to_path
    loader=loader)
  File "/usr/lib/python2.7/site-packages/ansible/playbook/role/include.py", line 59, in load
    return ri.load_data(data, variable_manager=variable_manager, loader=loader)
  File "/usr/lib/python2.7/site-packages/ansible/playbook/base.py", line 244, in load_data
    ds = self.preprocess_data(ds)
  File "/usr/lib/python2.7/site-packages/ansible/playbook/role/definition.py", line 94, in preprocess_data
    (role_name, role_path) = self._load_role_path(role_name)
  File "/usr/lib/python2.7/site-packages/ansible/playbook/role/definition.py", line 187, in _load_role_path
    raise AnsibleError("the role '%s' was not found in %s" % (role_name, ":".join(role_search_paths)), obj=self._ds)
ansible.errors.AnsibleError: the role '<NOTFOUNDROLE>' was not found in ./roles:<AC_ROLES_PATH>:/src/roles:/etc/ansible/roles:.

The <AC_ROLES_PATH> is the path provided in the ansible-container --roles-path option.

The missing <NOTFOUNDROLE> role is, at times:

  • when using docker container running on host for the first time post 2018-06-05:
    • the first role in the container.yml listing ("BuildBox/Base")
    • removing that role simply results in failing to find the new first role
  • when using docker container running on host working successfully previous to 2018-06-05:
    • the last role in the container.yml listing ("BuildBox/Configuration4")
    • if I remove the last role, making the list match what was working previous to 2018-06-05, the build completes fine

In all cases I can confirm all roles are present on the local / in-container filesystem before the ansible-container call.

The fact that on the working-before-2018-06-05 docker host, I can delete the recently-added last role and build successfully suggests that some caching is happening and maybe some intermediary tool changed (c.f. #673) but I am unable to determine what and where.

Failures not affected by presence/absense of --debug and/or --use-local-python

STEPS TO REPRODUCE

Create an on-iron virtualenv and set up environment as shown above
Create a Dockerfile with ansible-container environment as shown above
Set up the container.yml and various roles as described above
Run:

ansible-container build --services base --roles-path <wherever you put the roles>
EXPECTED RESULTS

working build, direct on-iron

ACTUAL RESULTS

debug output above, for ansible-container run in docker container on host, varies depending on host

@Voronenko
Copy link
Contributor

Voronenko commented Jun 13, 2018

I have taken a look on your issue , using POC repo from other issue as a base:

https://github.com/Nexlo/ansible-test

extending it to be double role:

services:
  web:
    from: ubuntu:14.04
    roles:
      - role: role-2
        gather_facts: no
      - role: my-new-role
        gather_facts: no

on a clear virtual env (py2 , base os ubuntu 16.04 LTS), without mentioned Dockerfile - works like a charm for me. This makes me think that issue might be not in ansible-container, but your environment (i.e. combination of dockerized ansible-container + conductor + container)

Perhaps you can create POC repository for issue, using above https://github.com/Nexlo/ansible-test as basis ?

as an option - try to build with --no-container-cache
i.e.

ansible-container build --no-container-cache --services base --roles-path <wherever you put the roles>

If it get's better , please comment here

@dchsueh
Copy link
Author

dchsueh commented Jun 14, 2018

Voronenko, I do appreciate you looking at this. (At this time it seems the support that ansible-container users might get post-2fa778a is ourselves!)

My original writeup is reeeeely long and the working/not-working scenarios are buried in too much other text:

  • on iron ubuntu xenial, inside virtualenv installed ansible-container - works fine
  • in docker container "FROM ubuntu:xenial", an identical install (not in virtualenv though) - fails but in different ways depending on what the docker host is

I agree that the main factor is the dockerized ansible-container setup. The thing that strikes me as very strange is that the dockerized configuration was working fine for the month or two that I was using it successfully before approx June 5. And on the server that was working previously, the roles that now cannot be found are the roles I added after June 5; all the previously found and previously working roles still work.

Would you mind trying running an ansible-container build in a container? Here's a minimal ubuntu:xenial Dockerfile that should run ansible-container successfully (mount in /var/run/docker.sock and your ansible code):

FROM ubuntu:xenial

WORKDIR /var/tmp

RUN apt-get -y update \
  && apt-get -y install curl python less

RUN curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py \
  && python get-pip.py \
  && pip --disable-pip-version-check install pip==9.0.3 \
  && pip --disable-pip-version-check install setuptools==39.2.0 \
  && pip --disable-pip-version-check install docker==2.7.0 \
  && pip --disable-pip-version-check install ansible-container[docker]==0.9.2 \
  && sed -i "s/filters={'name': self.secrets_volume_name}//g" /usr/local/lib/python2.7/dist-packages/container/docker/secrets.py \
  && sed -i "s/return os.path.join(os.sep, 'run', 'secrets')/return os.path.join(os.sep, 'docker', 'secrets')/g" /usr/local/lib/python2.7/dist-packages/container/docker/engine.py \
  && true
# sed filters addresses ansible-container bug described in https://github.com/moby/moby/issues/34121
# sed return is workaround for https://github.com/ansible/ansible-container/issues/762

RUN curl https://get.docker.com/builds/Linux/x86_64/docker-17.04.0-ce.tgz | tar -zxC /usr/local/bin/ --strip-components=1 docker/docker

pip freeze output in both in-virtualenv working and global-env nonworking ubuntu situations is:

$ pip freeze
ansible-container==0.9.2
backports.ssl-match-hostname==3.5.0.1
certifi==2018.4.16
chardet==3.0.4
colorama==0.3.9
docker==2.7.0
docker-pycreds==0.3.0
idna==2.7
ipaddress==1.0.22
Jinja2==2.10
MarkupSafe==1.0
PyYAML==3.12
requests==2.19.0
ruamel.ordereddict==0.4.13
ruamel.yaml==0.15.38
six==1.11.0
structlog==18.1.0
urllib3==1.23
websocket-client==0.48.0

--
edit: changed dockerfile from centos:7 to ubuntu:xenial

@Voronenko
Copy link
Contributor

From one hand I confirm the issue (i.e. in some circumstances role not found, if mapped to other path than on original host), from other hand whole approach is erroneous:

  1. You bind docker sock from (unknown) docker version - i.e. only you know it
  2. From other hand, you install very specific (and potentially incompatible with that sock) version of the docker inside container RUN curl https://get.docker.com/builds/Linux/x86_64/docker-17.04.0-ce.tgz

i.e. summary at that point - I would not do in that way.... and instead go with local python with ansible-container in virtual env

  1. build for sure happens on the target host , i.e. if you map your working folder into exactly same location, i.e. kind of
    -v /home/slavko/tmp/ansible-test:/home/slavko/tmp/ansible-test \
    and not
    -v /home/slavko/tmp/ansible-test:/app \

docker process starts to find mentioned roles and even tries to build.

I would not do building docker from docker with mapped sock. Using TCP port ? who knows - seems more reliable, at least it will send context there.

Hope that helps

@dchsueh
Copy link
Author

dchsueh commented Jun 15, 2018

your suggestions and analysis give me some good ideas on investigating a workaround or alternate approaches

I'll report back if anything ends up successful

(the idea of curl-ing the docker binary directly into the image comes from how the conductor images are created - "docker history --no-trunc ansible/container-conductor-centos-7:0.9.2")

thank you

@Voronenko
Copy link
Contributor

Your comment about conductor is right. So this is rather api compability.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants