Cplusplus: regex

regex etiketine sahip kayıtlar gösteriliyor. Tüm kayıtları göster

3 Mart 2020 Salı

std::regex Sınıfı

Giriş
Şu satırı dahil ederiz.

#include <regex>

Constructor - type
İmzası şöyle

basic_regex(const charT* p, flag_type f = regex_constants::ECMAScript);

Gramer olarak ECMAScript, basic, extended, awk, grep, egrep seçilebilir. Eğer seçilmezse ECMAScript kullanılır. Açıklaması şöyle.

A valid value of type syntax_option_type shall have exactly one of the elements ECMAScript, basic, extended, awk, grep, egrep, set.

Örnek - collate
Açıklaması şöyle.

collate Character ranges of the form "[a-b]" will be locale sensitive.

Şöyle yaparız.

std::regex re{"[А-Яа-яЁё]+", std::regex::collate};

Örnek - extended
Şöyle yaparız.

std::regex regex("[.]", std::regex::extended);

Örnek - icase
Şöyle yaparız.

std::regex regex ("...", std::regex::ECMAScript | std::regex::icase);

Örnek- icase
type olarak engine belirtilmiyor. Şöyle yaparız.

std::string pattern= "...";
std::regex regex (pattern,std::regex_constants::icase);

ECMAScript ve Non Greedy Davranış
Açıklaması şöyle. Yani ? karakteri quantifier'dan sonra gelirse ifade non greedy hale gelir.

The ? is treated differently (i.e., is not the zero-or-one quantifier) when it immediately follows a quantifier (*, +, ?, {exact}, {min,} and {min,max}) in that it makes the matching non-greedy:

Açıklaması şöyle.

By default, all these quantifiers are greedy (i.e., they take as many characters that meet the condition as possible). This behavior can be overridden to ungreedy (i.e., take as few characters that meet the condition as possible) by adding a question mark, ?, after the quantifier.

For example, matching "(a+).*" against "aardvark" succeeds and yields aa as the first submatch, while matching "(a+?).*" against it also succeeds, but yields a as the first submatch.

Örnek
İki tag arasındaki metini yakalamak için şöyle yaparız.

std::string html = "<ul><a href=\"http://stackoverflow.com\">SO</a></ul> "
                      "<ul>abc</ul>\n";
std::regex url_re(R"(<ul>([\s\S]*?)<\/ul>)", std::regex::icase);
std::copy( std::sregex_token_iterator(html.begin(), html.end(), url_re, 1),
           std::sregex_token_iterator(),
           std::ostream_iterator<std::string>(std::cout, "\n"));

4 Haziran 2018 Pazartesi

std::regex_match metodu

Giriş
Bu metodu kullanmak için şu satır dahil edilir.

#include <regex>

C++11 ile gelen bu sınıflar çok faydalı. Eskiden GNU ile gelen C tarzı metodları kullanır ve şu satırı dahil ederdik.

#include <regex.h>

Açıklamasını boost'tan aldım ancak std::regex_match için de geçerli.

The algorithm regex_match determines whether a given regular expression matches all of a given character sequence denoted by a pair of bidirectional-iterators, the algorithm is defined as follows, the main use of this function is data input validation.

String'in tamamının verilen regex'e uyup uymadığını true veya false olarak döner. C#'taki Regex.IsMatch() ile aynıdır. İlk parametre doğrulanacak string, ikinci parametre ise düzenli ifadedir.

regex_match - string + regex + smatch
bool döner. Eşleşme olup olmadığını döner, ayrıca sonuçları yakalayabilmemizi sağlar. Java'da şöyle yaparız.

Pattern p = Pattern.compile("^[\\w._]+@\\w+\\.[a-zA-Z]+$");
boolean matched = p.matcher("abcd_ed123.t12y@haha.com").matches();

Örnek
Şöyle yaparız.

std::string str;

std::regex reg (R"regex((\d+)([GMKgmk]){0,1})regex");
std::smatch match;
bool matched = std::regex_match(str, match, reg);
if(matched) {
  ...
}

Örnek
Şöyle yaparız.

const regex reg("([^:]+):([[:digit:]]+)");
smatch match;
if(regex_match(str, match,reg)) 
{
  cout << "ip: " << match[1] << " port: " << match[2] << endl;
}

regex_match metodu - iterator + regex
İmzası şöyle. Sadece eşleşme olup olmadığını döner.

template <class BidirectionalIterator, class charT, class traits>
bool regex_match(BidirectionalIterator first, BidirectionalIterator last,
                 const basic_regex<charT, traits> & e,
                 regex_constants::match_flag_type flags =
                 regex_constants::match_default);

regex_match metodu - string + regex
Şöyle yaparız. Sadece eşleşme olup olmadığını döner.

#include <iostream>
#include <regex>
#include <string>
using namespace std;

int main() {
  regex e("1");
  string s = "1,";
 
  cout << regex_match("1", e) << endl; //true
  cout << regex_match(s, e) << endl;   // true

  return 0;
}

Şöyle yaparız. Sadece eşleşme olup olmadığını döner.

#include <regex>

std::string str = "11111111111";

if (false == std::regex_match(str, std::regex("[-0-9]+")))
{
  std::cout << "Invalid Phone Number!\n";
}

Boost
Aynı örneği boost ile de birebir yapabiliriz.

#include <boost/regex.hpp>
#include <iostream>
#include <string>
int main() {
  boost::regex e("1");

  if(boost::regex_match("1", regex_bb01))
    std::cout<<"the regex matched\n";
}

22 Ocak 2018 Pazartesi

std::regex_replace

Giriş
Şu satırı dahil ederiz.

#include <regex>

C++11 ile gelen bu sınıflar çok faydalı. Eskiden GNU ile gelen C tarzı metodları kullanır ve şu satırı dahil ederdik.

#include <regex.h>

Girdi olan string, örüntüye göre dönüştürülüp yeni bir string döner. C#'taki Regex.Replace metoduna benzer.

string result = Regex.Replace(input, pattern, replacement)

match_flag
Açıklaması şöyle

format_default  Use ECMAScript rules to construct strings in
                std::regex_replace
format_sed      Use POSIX sed utility rules in std::regex_replace.

regex_replace - output iterator + input iterator begin + input iterator end + regex + format string
String yerine iterator ile çalışır. Şöyle yaparız.

std::ofstream fs ("filename");
std::string text = ...;


std::regex_replace(std::ostreambuf_iterator<char>(fs),
  text.begin(), text.end(), std::regex("(\\n+)"), "\n");

regex_replace - input string + regex + format string
Şöyle yaparız.

std::regex r("\\s+",  std::regex::optimize);
std::string test = ...;

std::string out = std::regex_replace(test, r, " ");

regex_replace - input string + regex + format string içinde back reference
Dönüşüm için back reference (capture group) değerleri kullanılabilir. Capture Group $n veya $nn şeklinde belirtilir.

$n

The nth capture, where n is a single digit in the range 1 to 9 and $n is not followed by a decimal digit. If n ≤ m and the nth capture is undefined, use the empty String instead. If n > m, the result is implementation-defined.

$nn

The nnth capture, where nn is a two-digit decimal number in the range 01 to 99. If nn ≤ m and the nnth capture is undefined, use the empty String instead. If nn > m, the result is implementation-defined.

Örnek
$nn şeklinde 01 sayısının kullanımı var.

regex regex_a( "(.*)bar(.*)" );
cout << regex_replace( "foobar0x1", regex_a, "$010xNUM" ) << endl;

Çıktı olarak şunu alırız

foo0xNUM

Örnek
$1 şeklinde kullanım var. Şöyle yaparız.

std::string text = "...";
std::regex re("...");
std::string out = std::regex_replace(text, re, "<$1>");

Diğer
boost::regex_replace \\1 şeklinde çapture group erişimine izin veriyor. Bu kullanım sanırım Perl tarzı. C++11 ile uyumlumu bilmiyorum.

string s = boost::regex_replace(
    string("Example_45-3"),
    boost::regex("[^0-9]*([0-9]+).*"),
    string("\\1")
    );