Introduction
DataSet parsers have a number of predefined patterns. Although it isn't possible to modify them directly, you can define your own as needed. This is beneficial for a number of reasons:
- Minimizes parser length
- Improves maintainability -- update regular expressions from central location within the parser
- Only extracts data in the format you defined
Please note that due to structure of our parsers, patterns can only be implemented as part of an attribute. For example, $attribute=pattern$
would apply the regex in pattern
to what $attribute
extracts. However, patterns can't be used outside of an attribute definition (i.e. in a rewrites
statement or inline within a format
string).
User-defined patterns can also be applied within the context of key/value pair extraction. In contrast, reserved patterns like json
or pythonDict
use additional logic to iterate through delimited log events and can't be defined or modified at the user level. For additional flexibility, we recommend using the attrBlacklist
and attrWhitelist
extensions to fine-tune which attributes are ingested.
Example - Remove Commas from Numeric Values
The following parser removes formatting commas from any numeric fields that it extracts. Patterns are used to ensure that the values are extracted correctly (with commas)
Test Entries
param=160,000,000
param=10,000
param=9,000
Parser
Results
param=160,000,000
message: param=160,000,000
param: 160000000
param=10,000
message: param=10,000
param: 10000
param=9,000
message: param=9,000
param: 9000
Example - Sonicwall Firewall Logs
We can further expand this to parse Sonicwall Firewall logs. Please see the comments within the GIST for more information.
Test Entries
<134> id=Sonicwall_Unit sn=12345 time="2021-08-08 20:09:14" fw=192.168.4.10 pri=5 c=262144 gcat=6 m=98 msg="Connection Opened" src=10.0.0.2:56096:XX dst=10.0.0.3:443:XY proto=tcp/https sent=52 app=49177 appName='General HTTPS' n=41788223 fw_action="NA" dpi=0 app: 49177
<85>Aug 08 20:09:15--7:00 192.168.18.254 Action="accept" service_id="Robo_Towhee-NT001" src="10.0.0.5" dst="10.0.0.7" proto="6" xlatesrc="" xlatedst="192.168.19.251" NAT_rulenum="13" NAT_addtnl_rulenum="1" security_inzone="TestZone" security_outzone="" user="Aldous Huxley (aldous)(+)" src_user_name="Aldous Huxley (aldous)(+)" src_machine_name="linuxbox.towhee.lan" src_user_dn="CN=Aldous HUxley,OU=Robo_Towhee-NT001,OU=Users,OU=HO,OU=BDC,DC=internal,DC=towhee,DC=lan(+)" snid="" dst_user_name="" dst_machine_name="" dst_user_dn="" UP_match_table="TABLE_START" ROW_START="0" match_id="11" layer_uuid="9d9a7ecf-708b-4f77-a55c-48b21a38caab" layer_name="Network (whee)" rule_uid="e384160e-f955-11eb-9a03-0242ac130003" rule_name="Misc" ROW_END="0" ROW_START="1" match_id="16777221" layer_name="Test Layer" rule_name="Misc Test 2" ROW_END="1" UP_match_table="TABLE_END" ProductName="FireWall" svc="1234" sport_svc="12345" xlatedport_svc="" xlatesport_svc="" ProductFamily=""
The attributes are extracted as follows:
<134> id=Sonicwall_Unit sn=12345 time="2021-08-08 20:09:14" fw=192.168.4.10 pri=5 c=262144 gcat=6 m=98 msg="Connection Opened" src=10.0.0.2:56096:XX dst=10.0.0.3:443:XY proto=tcp/https sent=52 app=49177 appName='General HTTPS' n=41788223 fw_action="NA" dpi=0 app: 49177
app: 49177
appName: General HTTPS
c: 262144
dpi: 0
dst: 10.0.0.3:443:XY
fw: 192.168.4.10
fw_action: NA
gcat: 6
id: Sonicwall_Unit
m: 98
message: ... Truncated ...
msg: Connection Opened
n: 41788223
pri: 5
proto: tcp/https
sent: 52
sn: 12345
src: 10.0.0.2:56096:XX
time: 2021-08-08 20:09:14
timestamp: 2021-08-08 20:09:14 (parsed as: Mon Aug 9, 2021 3:09:14 AM GMT, i.e. 1089 minutes ago)
<85>Aug 08 20:09:15--7:00 192.168.18.254 Action="accept" service_id="Robo_Towhee-NT001" src="10.0.0.5" dst="10.0.0.7" proto="6" xlatesrc="" xlatedst="192.168.19.251" NAT_rulenum="13" NAT_addtnl_rulenum="1" security_inzone="TestZone" security_outzone="" user="Aldous Huxley (aldous)(+)" src_user_name="Aldous Huxley (aldous)(+)" src_machine_name="linuxbox.towhee.lan" src_user_dn="CN=Aldous HUxley,OU=Robo_Towhee-NT001,OU=Users,OU=HO,OU=BDC,DC=internal,DC=towhee,DC=lan(+)" snid="" dst_user_name="" dst_machine_name="" dst_user_dn="" UP_match_table="TABLE_START" ROW_START="0" match_id="11" layer_uuid="9d9a7ecf-708b-4f77-a55c-48b21a38caab" layer_name="Network (whee)" rule_uid="e384160e-f955-11eb-9a03-0242ac130003" rule_name="Misc" ROW_END="0" ROW_START="1" match_id="16777221" layer_name="Test Layer" rule_name="Misc Test 2" ROW_END="1" UP_match_table="TABLE_END" ProductName="FireWall" svc="1234" sport_svc="12345" xlatedport_svc="" xlatesport_svc="" ProductFamily=""
Action: accept
NAT_addtnl_rulenum: 1
NAT_rulenum: 13
ProductName: FireWall
ROW_END: 0
ROW_START: 0
UP_match_table: TABLE_START
dst: 10.0.0.7
hostIP: 192.168.18.254
layer_name: Network (whee)
layer_uuid: 9d9a7ecf-708b-4f77-a55c-48b21a38caab
match_id: 11
message: ... Truncated ...
proto: 6
rule_name: Misc
rule_uid: e384160e-f955-11eb-9a03-0242ac130003
security_inzone: TestZone
service_id: Robo_Towhee-NT001
sport_svc: 12345
src: 10.0.0.5
src_machine_name: linuxbox.towhee.lan
src_user_dn: CN=Aldous HUxley,OU=Robo_Towhee-NT001,OU=Users,OU=HO,OU=BDC,DC=internal,DC=towhee,DC=lan(+)
src_user_name: Aldous Huxley (aldous)(+)
svc: 1234
timestamp: Aug 08 20:09:15 (parsed as: Mon Aug 9, 2021 3:09:15 AM GMT, i.e. 1089 minutes ago)
timezone: GMT-7:00
ts: Aug 08 20:09:15--7:00
user: Aldous Huxley (aldous)(+)
xlatedst: 192.168.19.251
Comments
0 comments
Please sign in to leave a comment.