comparing records in two hive tables having same schema
This is what you can do:
Join both the tables using the unique key( i believe u must be having unique identifier in ur table)
use the hash value of all the columns combined using hash function in hive to figure out the difference.query will look like this:
select * from tab1 a join tab2 b
using a.id=b.id
where hash(a.col1,a.col2....)<>hash(b.col1,b.col2...);
how to apply multiple count in hive query
Use group by
and aggregation as count(*)
in your select query
Try with this query:
select act,count(*) from <table_name> group by act;
Comparing two tables for equality in HIVE
The first one excludes rows where t1.c1, t1.c2, t1.c3, t2.c1, t2.c2, or t2.c3 is null. That means that you effectively doing an inner join.
The second one will find rows that exist in t1 but not in t2.
To also find rows that exist in t2 but not in t1 you can do a full outer join. The following SQL assumes that all columns are NOT NULL
:
select count(*) from table1 t1
full outer join table2 t2
on t1.key=t2.key and t1.c1=t2.c1 and t1.c2=t2.c2 and t1.c3=t2.c3
where t1.key is null /* this condition matches rows that only exist in t2 */
or t2.key is null /* this condition matches rows that only exist in t1 */
Hive: Joining two tables with different keys
It's little difficult to do this Hive as there are many limitations. This is how I solved it but there could be a better way.
I named your tables as below.
Table1 = EmpActivity
Table2 = ActivityMas
The challenge comes due to the null fields in Table2. I created a view and Used UNION to combine result from two distinct queries.
Create view actView AS Select * from ActivityMas Where Activityid ='';
SELECT * From (
Select EmpActivity.EmpId, EmpActivity.Category, ActivityMas.categdesc
from EmpActivity JOIN ActivityMas
ON EmpActivity.Category = ActivityMas.Category
AND EmpActivity.ActivityId = ActivityMas.ActivityId
UNION ALL
Select EmpActivity.EmpId, EmpActivity.Category, ActView.categdesc from EmpActivity
JOIN ActView ON EmpActivity.Category = ActView.Category
)
You have to use top level SELECT clause as the UNION ALL is not directly supported from top level statements. This will run total 3 MR jobs. ANd below is the result I got.
44127 10 billable
44128 12 billable
44130 15 Non-billable
44132 43 Benefits
44131 33 Benefits
44126 33 Training
44129 33 Bench
Best way to compare three columns in sql Hive
Logically you have an issue.
col1 = col2
Therefore if col1 != col3 then col2 != col3;
There for it's really enough to use:
select * from T1 where col1 = col2 and col1 != col3;
It is appropriate to do this map side so using a where
criteria is likely good enough.
If you wanted to say 2 out of the 3 need to match you could use group by
with having
to reduce comparisons.
Related Topics
Splitting SQL Column into Multiple Columns Based on Value
How to Make SQL Query Result Show With 2 Decimals
Wamp Server Error [Local Server - 2 of 3 Services Running]
Sql: Select All Rows If Parameter Is Null, Else Only Select Matching Rows
Getting Student Name With Highest Total Mark in SQL
How to Get the Numeric Part from a String Using T-Sql
Exclude a Column Using Select * [Except Columna] from Tablea
Sql Query to Get Number of Times a Field Repeats for Another Specific Field
Postgres Query to Check a String Is a Number
Mysql Split Comma Separated String into Temp Table
Postgresql Query to Return Results as a Comma Separated List
Select Count of Total Products as Well as Out of Stock Products from Table
How to Find Multiple Occurrence of Particular String and Fetch Value in SQL Server
How to Split One Column into Two Columns in SQL Server