how to extract a part of a string in hive
Using Hive regexp_extract(string subject, string pattern, int index)
function:
SELECT regexp_extract(desc, '.*? (\\d+) .*$', 1) AS Revenue
FROM table1
See other examples in:
- "Hive QL selecting numeric substring of string"
- "extracting a substring from a text column in hive"
Extract substring with a specific pattern in Hive SQL
Try using:
SELECT colname FROM tableName WHERE REGEXP_EXTRACT(colname, ".*(M6[^_]*).*",1)
Regex used:
.*(M6[^_]*).*
Regex Demo
Explanation:
.*
- matches 0+ occurrences of any character that is not a newline character(M6[^_]*)
- matchesM6
followed by 0+ occurrences of any character that is not a_
. So, after M6, it keeps on matching everything until it finds the next_
. The parenthesis is used to store this sub-match in Group 1.*
- matches 0+ occurrences of any character that is not a newline character
Get a substring in hive
There are several ways you can extract hours
from timestamp
value.
1.Using Substring function:
select substring(string("2017-06-05 09:06:32.0"),12,2);
+------+--+
| _c0 |
+------+--+
| 09 |
+------+--+
2.Using Regexp_Extract:
select regexp_Extract(string("2017-06-05 09:06:32.0"),"\\s(\\d\\d)",1);
+------+--+
| _c0 |
+------+--+
| 09 |
+------+--+
3.Using Hour:
select hour(timestamp("2017-06-05 09:06:32.0"));
+------+--+
| _c0 |
+------+--+
| 9 |
+------+--+
4.Using from_unixtime:
select from_unixtime(unix_timestamp('2017-06-05 09:06:32.0'),'HH');
+------+--+
| _c0 |
+------+--+
| 09 |
+------+--+
5.Using date_format:
select date_format(string('2017-06-05 09:06:32.0'),'hh');
+------+--+
| _c0 |
+------+--+
| 09 |
+------+--+
6.Using Split:
select split(split(string('2017-06-05 09:06:32.0'),' ')[1],':')[0];
+------+--+
| _c0 |
+------+--+
| 09 |
+------+--+
extracting a substring from a text column in hive
Use regexp_extract function with the matching
regex to capture only the displayName
from your title
field value.
Ex:
hive> with tb as(select string('"id":"S-1-98-13474422323-33566802",
"name":"uid=Xzdpr0,ou=people,dc=vm,dc=com","shortName":"XZDPR0",
"displayName":"Jund Lee","emailAddress":"jund.lee@bm.com",
"title":"Leading Product Investor"')title)
select regexp_extract(title,'"displayName":"(.*?)"',1) title from tb;
+-----------+--+
| title |
+-----------+--+
| Jund Lee |
+-----------+--+
Single hive query to extract a piece of string
You could start from the first slash and take everything until the next space:
regexp_extract(testdata, '(/[^\\s]+)', 0)
Impala/Hive function to get the substring of a string
Try:
REGEXP_EXTRACT('your string', ':abd: ([^:]+)', 1)
The regexp :abd: ([^:]+)
means match ':abd: ' folowed by any characters that are not ':'.
This regexp assumes that ':' does not appears withing the "value" strings. As such, it would fail on this input:
:abd: 5768:92034 :erg: 94856023MXCI :oute: A RF WERS YUT :oowpo: 649217349GBT GB
Related Topics
How to Get Previous Row Data in SQL Server
SQL Server Dynamic PIVOT Query
Column Is of Type Timestamp Without Time Zone But Expression Is of Type Character
How to Compare Timestamp Dates With Date-Only Parameter in MySQL
Spark - Query Dataframe Based on Values from a Column in Another Dataframe
How to Check If Value Is Inserted Successfully or Not
Phone Number Display Method, SQL Query
Sql Take Just the Numeric Values from a Varchar
Error 1265. Data Truncated for Column When Trying to Load Data from Txt File
Duplicate Rows When Joining Tables
I Want to Give Serial No in My Query According to Group
Sql String: Counting Words Inside a String
Get Count of Records in Every Hour in the Last 24 Hour
Sql Query for Increase Item Value Price for Multiple Item
Sql Count All Word Occurrences from a Table
How to Display the Value of Avg Function Till Only Two Decimal Places in SQL