Go Back   FlashFXP Forums > >

General Discussion Need help? Have a problem? Let us help you. Bug reports and feature requests should be made using the Bug Tracker or Feature Tracker

Closed Thread
 
Thread Tools Rate Thread Display Modes
Old 04-04-2014, 07:15 PM   #1
bigstar
FlashFXP Developer
FlashFXP Administrator
ioFTPD Beta Tester
 
bigstar's Avatar
 
Join Date: Oct 2001
Posts: 8,012
Default

Here's the update
http://get.flashfxp.com/5.0/FlashFXP50_3725_Setup.exe

I've made a small change to the syntax prefix
Code:
rx .*(txt|log)$
notice instead of regex:<space> the prefix has changed to just rx<space>
bigstar is offline  
Old 04-10-2014, 10:57 AM   #2
DayCuts
Senior Member
FlashFXP Beta Tester
 
Join Date: Dec 2003
Posts: 421
Default

Wrote quite a lengthy/detailed response breaking down the problems with your attempts, the misunderstanding about how lookarounds work, and how to design a pattern that works but my browser crashed so you will just have to settle for the footnotes and research yourself to get a better understanding.

Pure PCRE solution:
Code:
(?im)^(?!.+-(publisher1|publisher2)$).+$
Code:
/subdir/subdir/.../Author1_-_Title1_(1234)-Publisher1
/subdir/subdir/.../Author2_-_Title2_(1234)-PublisherX
/subdir/subdir/.../Author2_-_Title2_(1234)-Publisher2
Other notes:
Quote:
Originally Posted by bigstar View Post
What might be more suited for what you desire is to use the Selective Transfer feature.
I agree with this suggestion, regex pattern matching was not designed for 'non-matching'. Although it can be done the internal processing is more expensive for anything other than use with single characters, as is the use of lookarounds, etc. While the above pattern should work in the Skip List if PCRE matching is now also possible within selective transfer rule sets I would highly suggest ditching the expensive 'non-match' style negative lookahead pattern and opting for a normal 'match' style pattern.

Quote:
Originally Posted by bigstar View Post
I've made a small change to the syntax prefix
Can I suggest you reinstate the colon as part of the prefix? There should be no circumstances in which somebody might try (or be able) to match 'rx:<space>...' as a literal (non regex) pattern, however there is the possibility of somebody trying to match 'rx<space>...'.

Quote:
Originally Posted by bigstar View Post
It took me some time to figure out the proper way to ignore case with PCRE, I am not 100% sure if this is correct.
Code:
rx (?i).*-(?!publisher1)
Your use of (?i) here is correct. Given that FlashFXP is a windows client and windows (and the users there of) mostly think is a case-insensitive manner it might be okay to make it case-insensitive by default. Just so long as the case-sensitive modifier can be used within the pattern. (?-i) would force case sensitivity.

Modifiers/switches can be used anywhere in a pattern. When a modifier is seen it is explicitly applied to the remainder of the pattern, or until switch by another modifier. The basic form of a modifier is (?[onswitches][-offswitches][:regex]). This support for :regex means you can do things like (?i)^x(?-i:Y)z to match any case form of xyz as long as Y is capitalized, where (?-i:Y) is equivalent to (?-i)Y(?i).

A great regex introductory tutorial can be found at Regular-Expressions.info - Regex Tutorial, Examples and Reference - Regexp Patterns
DayCuts is offline  
Old 04-10-2014, 11:21 AM   #3
brackebuschtino
Member
FlashFXP Registered User
 
Join Date: Feb 2012
Location: /dev/null
Posts: 40
Default

Thanks for your reply and the suggested pattern. In fact i did my homework and searched the web as well as asked other developers, which resulted in a negative lookbehind reather than lookahead.
Code:
rx .*(?<!-PublisherX)$
Quote:
I would highly suggest ditching the expensive 'non-match' style negative lookahead pattern and opting for a normal 'match' style pattern.
The issue with this solution is that one is forced to manually select all highlighted results and put them into queue, while when using the skiplist with the above pattern (or yours) allows for putting a complete directory into queue and leave the rest to the application wish will reliably drop all non-matching queue items. This is exactly what i want. If i was satified with the manual way of scanning a folder and pick the cherries i wouldn't had asked for the skiplist improvement.

Regarding the "expensiveness":
I think that with todays computer power this plays no role. Furthermore i think that a little more time for regex-processing results in less intensive server workload. Also i think that not everybody using FFXP has an active skiplist that might have an impact on the transfer speed.

In fact im OK with every implementation (skiplist, selective transfer) that allows for the current state (PCRE support and lookahead/lookbehind-support) that allows to match as exactly as wished.

Quote:
[...]reinstate the colon as part of the prefix?[...]
I agree to this suggestion. I also found the blank alone to be potentially more confusing than having the colon visually presenting the delimitation. Mabe the blank could be dropped completely as the colon could satisfy the requirement as a delimiter?

Thanks a bunch for the on-/off-switch lession. I didn't know that yet. With this feature available i absolutely agree to your suggestion to make the pattern matching case insensitive by default.
brackebuschtino is offline  
Old 04-12-2014, 05:36 AM   #4
DayCuts
Senior Member
FlashFXP Beta Tester
 
Join Date: Dec 2003
Posts: 421
Default

Quote:
Originally Posted by brackebuschtino View Post
Thanks for your reply and the suggested pattern. In fact i did my homework and searched the web as well as asked other developers, which resulted in a negative lookbehind reather than lookahead.
Code:
rx .*(?<!-PublisherX)$
Yep, in fact a lookbehind is the more appropriate selection in this case since the part of the string your most interested in is at the end. Less expensive as well.

Quote:
Originally Posted by brackebuschtino View Post
The issue with this solution is that one is forced to manually select all highlighted results and put them into queue, while when using the skiplist with the above pattern (or yours) allows for putting a complete directory into queue and leave the rest to the application wish will reliably drop all non-matching queue items. This is exactly what i want. If i was satified with the manual way of scanning a folder and pick the cherries i wouldn't had asked for the skiplist improvement.
I was refering to use in the Selective Transfer rules when I suggested simplifying, which already gives the option to Transfer or Skip and a choice of File and Folder matching. You could use a combination of the skip list for most rules, and a selective transfer ruleset for those that require negating.

Quote:
Originally Posted by brackebuschtino View Post
Regarding the "expensiveness":
I think that with todays computer power this plays no role. Furthermore i think that a little more time for regex-processing results in less intensive server workload. Also i think that not everybody using FFXP has an active skiplist that might have an impact on the transfer speed.
Abundant resources is no excuse not to do things in the most efficient way possible. While in a normal regex matching situation (one pattern against one string or file) it may be negligible, in a situation where you may end up with multiple look-around rules among a list of dozens of other rules that all have to be checked against a potentially huge list of files/directories expensiveness can add up quickly to a noticeable delay. Admittedly you would likely need a complex skip list and huge directory listing to notice anything on the average system these days.

Quote:
Originally Posted by bigstar View Post
Both rx<space> and rx:<space> can be used depending on your own preference.

It made more sense to me to simplify the prefix to to rx<space> because in most instances trailing spaces are automatically stripped off.
My concern here was more to do with the difference between something like "rx abc*.mp?" being processed as a basic glob or pcre. The results would be vastly different due to wildcard and period function in regular expressions. Requiring the colon would be a way of ensuring somebody not familiar regular expressions (or the support for them in the program) does not try to use a simple glob rule that is misinterpreted.
DayCuts is offline  
Old 04-12-2014, 10:36 AM   #5
brackebuschtino
Member
FlashFXP Registered User
 
Join Date: Feb 2012
Location: /dev/null
Posts: 40
Default

Quote:
Yep, in fact a lookbehind is the more appropriate selection in this case since the part of the string your most interested in is at the end. Less expensive as well.
Unfortunately this doesn't seem to allow for grouping or appending an additional group that might exist. At least it didn't work for me with:

Code:
rx .*(?<!-PublisherX(_int)?)$
Code:
rx .*(?<!-(PublisherX|OtherY))$
brackebuschtino is offline  
Old 04-12-2014, 07:29 PM   #6
DayCuts
Senior Member
FlashFXP Beta Tester
 
Join Date: Dec 2003
Posts: 421
Default

Learned something when trying to figure out why (_int)? worked in a look-ahead but not a look-behind. The answer is that in almost all regex flavors (language implementations) a look-behind must be a fixed-width expression. Not only does this mean you can not include ? + *, which rules out anything like (_int)?, but you also can not include optionals of different lengths like (pub|longpublisherame). Ultimately this means that a look-behind is not a viable option for your purposes unless the developing language of the program is using it is .NET or ABA.

Now onto a solution... first of all one reason optionals were not working for you is that you are forgetting part of the expression. The modifiers and anchors are important. I did come up with a working solution using a look-behind, but my test list used equal length publisher names. It failed thereafter due to the fixed-length requirement but here it is anyway...
Code:
(?im)(?(DEFINE)(?<publist>(?:publisher1|publisher2)))^.+(?(?<=_int$)(?<!-(?&publist)_int)|(?<!-(?&publist))$)
An updated version of the original pattern...
Code:
(?im)^(?!.+-(?:publisher1|publisher2)(?:_int)?$).+$
DayCuts is offline  
Old 04-14-2014, 10:50 AM   #7
brackebuschtino
Member
FlashFXP Registered User
 
Join Date: Feb 2012
Location: /dev/null
Posts: 40
Default

Thanks alot for this lession. It turns out that using lookarounds is very tricky. I would never been able to adopt this pattern on my own.
brackebuschtino is offline  
Old 04-10-2014, 04:06 PM   #8
bigstar
FlashFXP Developer
FlashFXP Administrator
ioFTPD Beta Tester
 
bigstar's Avatar
 
Join Date: Oct 2001
Posts: 8,012
Default

Quote:
Originally Posted by DayCuts View Post
Can I suggest you reinstate the colon as part of the prefix? There should be no circumstances in which somebody might try (or be able) to match 'rx:<space>...' as a literal (non regex) pattern, however there is the possibility of somebody trying to match 'rx<space>...'.
Both rx<space> and rx:<space> can be used depending on your own preference.

It made more sense to me to simplify the prefix to to rx<space> because in most instances trailing spaces are automatically stripped off.

Quote:
Originally Posted by DayCuts View Post
Just so long as the case-sensitive modifier can be used within the pattern. (?-i) would force case sensitivity.
Thank you for clarification, I was not aware of using - to reverse to modifier.

I don't use regexp as much as one might think and most of this is new to me as well
bigstar is offline  
Closed Thread

Tags
checked, folders, mask, skip, skiplist


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -5. The time now is 02:10 PM.

Parts of this site powered by vBulletin Mods & Addons from DragonByte Technologies Ltd. (Details)