Overview
In this scenario, the customer has called with this problem: Backups are failing
Problem Description: The customer attempts to disjoin a DXi from Active Directory and when he gets to the form that requests the administrator username and password, the field does not allow any input.
When the customer configures the system for segmentation, he is left with an inaccessible GUI or a GUI that does not reflect accurate information.
Available Information: Tech Support has the logs from Samba, the OS, and the DXi software.
Tech Support’s Initial Thoughts: Tech Support tries to ssh to the system and remove it from the domain manually. (Remove the .cifs_configured file, refresh the GUI page, and add the system to a workgroup.) The Tech Support person does a grep –r for a particular string against the entire filesystem in an attempt to find the respective configuration file.
Questions from Tech Support: Why doesn’t the GUI allow input when disjoining from Active Directory? Why does it happen sometimes and not others? What is the most appropriate way to help the customer? Should Service/Support have looked at something else in the Active Directory? If so, what and why?
--------------------------------------------------------------------------------------------
Next Steps: Responses from Engineering
Starting with Galaxy 1.4, disjoining is always a successful operation. See the “Solution for Galaxy 1.4 and later” section below for explanation. Most of the following discussion applies to Pre-Galaxy 1.4.
What should I look for?
Open the collect log and browse to directory node1-collection/nas-info and check the following files:
- Check file listdir_node1.out, which captures the output of the command “ls –la /snfs/common/galaxy-config/nas/node1”.
- If the output shows file .cifs_configured, the DXi was still joined to the domain/workgroup.
- Otherwise, the DXi was in unjoined state. This means the disjoin operation succeeded or the DXi was never joined. Both cases are not a concern.
- If the DXi was in the joined state, check further the file smb.conf:
- If the file contains the line “security = ADS”, the DXi was in the ADS domain whose name is listed on the line starting with “realm =”.
- If the file contains the line “security = user”, the DXi was in a workgroup whose name is listed on the line starting with “workgroup =”
- If the DXi was in an ADS domain, check further the file cifs.ads.leave. This file is the log of the latest run of the net command to disjoin the DXi from the domain. Interpretation of this file requires experience in Samba technical. However, if the disjoin failed, look for lines that contain the words “error”, “failed”, etc.
- If it is determined that the ADS disjoin was a failure, it is fairly easy to work around. It is much easier to fix the disjoin operation than the join one.
Why is there a failure to disjoin from ADS?
There are many reasons:
- DNS not working
- Kerberos server not working
- Invalid credential:
- The username entered in the GUI text box must not have a domain name prefix
- This user must be a domain user. He can be a domain admin user, or a regular user as long as this user has been granted the right to join domain.
- Customer logged into ADS and manually deleted the DXi computer account from the ADS instead of using the DXi GUI to disjoin. This case is discussed in detail in (3) below.
- Old administrator account is no longer available and no one has been granted the right to join/disjoin anymore.
- Domain name has changed
- Primary domain controller has changed
- etc
Why doesn’t the GUI allow input when disjoining from Active Directory? Why does it happen sometimes and not others?
The short answer is that the CIFS configuration file (/etc/samba/smb.conf) has been tampered/corrupted and the ADS information is lost. Therefore, the GUI does not recognize the domain state (ADS or workgroup) of the CIFS server.
The proper procedure is to first use the DXi GUI to disjoin. This step is then optionally followed by logging on MMC tool to delete the DXi machine account from the domain. But some ADS administrators tend to do the reverse.
The following example explains how this happened in the past:
One of the common scenarios occurring at customer sites in the past involves experienced ADS administrators who are very familiar with the process of disjoining Windows systems from the ADS domain. Their common mistake is that the first thing they do is to log on the ADS server and use the MMC (Microsoft Management Console) to remove the DXi from the domain. This means they have changed the ADS database without properly informing the DXi system.
Therefore, the DXi system still believes it is in the domain and this is reflected on the GUI. Because the customers see the DXi still in the domain, they will disjoin it using the DXi GUI. As part of the disjoining process, the DXi will ask the ADS server to disable the DXi computer account. Because the ADS server has no record of the DXi (it was already deleted), it will fail the DXi request.
Once this situation happens, the disjoin process will fail but the GUI does not offer customers any way to repair the damage. Some customers may ssh to the DXi system and tamper with the smb.conf file. If the ADS information in this file is changed, the GUI will not display any input textboxes. This requires a support call to help fix the CIFS configuration.
Solutions
Manual solution for all Galaxy releases
- ssh to the DXi system
- Remove this file (which is empty, by the way): /snfs/common/galaxy-config/nas/node1/.cifs_configured
- Refresh the GUI to see that the DXi is now in the unjoined state.
- The DXi is now ready for joining to a workgroup or ADS
Note that there is no need to look at the ADS. The DXi is in the unjoined state but its computer account is still active in the ADS domain. To completely remove it from the domain, the customers have to log on the ADS server and use the MMC tool to remove the DXi account.
Solution for Pre-Galaxy 1.2
See the manual procedure above. That’s the only solution.
Solution for Galaxy 1.2 and later
The manual solution still works. But the following syscli command can also be used for all Galaxy releases starting from 1.2: syscli --cifs --deactivate
Solution for Galaxy 1.4 and later
The manual solution still works. But starting with Galaxy 1.4, the disjoin operation has been redesigned to always succeed whether it is run from the GUI or the syscli command-line tool, even if the user enters a bogus username and password. The only difference is as follows:
- If the username and password are correct and the user has the right to join domain, the disjoin process will succeed and the DXi machine account will be disabled. To completely remove the DXi machine account, the user has to use MMC tool to delete it.
- If the username and password are bogus, the disjoin process still succeeds but the DXi machine account is still enabled in the ADS.
In both cases the user has to use MMC tool to delete the DXi machine account from the domain.
Note: This redesign has been adopted to address the following valid scenarios, which can easily arise if the DXi is moved from one company to another:
- Domain name has changed
- Primary domain controller has changed
- Old administrator account is no longer available and no user has been granted the right to join/disjoin anymore.
- etc.
Solution for Galaxy 2.0
CT SAID: What is the response here?