How permission groups (by default) in hdfs work?? Why all users files belong to supergroup?
Well, long story short, looks like security was disabled after all. I just didn't know that server-side services do not use /etc/hadoop/conf, but each has their own configs inside /var/run/cloudera-scm-agent/process/_process-name/
. These can also be seen in CM UI e.g. CM ->HDFS -> Instances -> NameNode -> Processes -> hdfs-site.xml.
http://community.cloudera.com/t5/Storage-Random-Access-HDFS/HDFS-default-permissioning-workes-weird-CDH5-1/m-p/24137#U24137
Difference between Superuser and supergroup in Hadoop
Superuser
Based on the Hadoop official documentation:
The super-user is the user with the same identity as the name node process itself. Loosely, if you started the name node, then you are the super-user. The super-user can do anything in that permissions checks never fail for the super-user.
Supergroup
Supergroup is the group of superusers. This group is used to ensure that the Hadoop Client has superuser access. It can be configured using dfs.permissions.superusergroup
property in the core-site.xml
file.
References
- Hadoop superuser and supergroup
Hadoop: Pseudo Distributed mode for multiple users
Adding a dedicated Hadoop system user
We will use a dedicated Hadoop user account for running Hadoop. While that’s not required it is recommended because it helps to separate the Hadoop installation from other software applications and user accounts running on the same machine (think: security, permissions, backups, etc).
#addgroup hadoop
#adduser --ingroup hadoop hadoop1
#adduser --ingroup hadoop hadoop2
This will add the user hduser and the group hadoop to your local machine.
Change permission of your hadoop installed directory
chown -R hduser:hadoop hadoop
And lastly change hadoop temporary directoy permission
If your temp directory is /app/hadoop/tmp
#mkdir -p /app/hadoop/tmp
#chown hduser:hadoop /app/hadoop/tmp
and if you want to tighten up security, chmod from 755 to 750...
#chmod 750 /app/hadoop/tmp
Adding ec2-user to use hadoop
SSH in as hadoop@(publicIP) for Amazon EMR.
From there you can do anything you like with HDFS without having to "su." I just did an mkdir and ran distcp and a streaming job. I do everything as hadoop@, as per the EMR instructions.
Related Topics
Set-Up X11 Forwarding Over Ssh
Android Sdk on a 64-Bit Linux MAChine
Sed Replace In-Line a Specific Column Number Value at a Specific Line Number
Does Linux Malloc() Behave Differently on Arm Vs X86
How to Deploy a Container into a Specific Node in a Docker Swarm
How to Ignore Some Differences in Diff Command
Using Mkdir -M -P and Chown Together Correctly
How to Move a Relative Symbolic Link
How to Output Return Code in Shell
Concatenating Two String Variables in Bash Appending Newline
Linux Mint - Adding Environment Variables Permanently
How to Execute Script in The Current Shell on Linux
Warning Building a Kernel Module That Uses Exported Symbols
Where to Start Learning About Linux Dma/Device Drivers/Memory Allocation
Convert an Iso Date to Seconds Since Epoch in Linux Bash
Linux/Gcc: Ldd Functionality from Inside a C/C++ Program