fuzzing re2c
My dearest friend maintains re2c
(The fast c
lexer
generator).
Conceptually re2c
is a simple tool: it accepts single source file in
.re
format and generates file in .c
format.
It usually works fine but occasionally (mostly in development branches)
re2c
fails to produce valid output. Extensive test
suite is meant
to cover past errors and complex lexers seen in the wild but it does not
help testing new features and bizarre corner cases.
One day inspired with QuickFuzz
I’ve decided
to spend 30 minutes to write a fuzzer for re2c
.
I started from very simple model: we’ll support only basic set of
operations. The full list is:
"a"
"b"
"(" ... ")"
(brackets)"*"
(star)- concatenation
"|"
(alternative)
Haskell community has a wonderful QuickCheck
library
(paper)
to generate random instances of a given datatype. The only thing I need
is to put these random instances into the file and run re2c
self-validation on it. -S
(skeleton
mode) option
generates a validator, not a normal lexer.
-- qc_re2c.hs
{-# LANGUAGE LambdaCase #-}
import qualified Test.QuickCheck as Q
import qualified Test.QuickCheck.Monadic as QM
import qualified System.Random as SR
import qualified System.Process as SP
import qualified System.Exit as SE
import qualified Data.ByteString.Char8 as BS
data E = A
| B
| Alt E E
| Cat E E
| Star E
instance Show E where
show = \case
A -> "\"a\""
B -> "\"b\""
Alt l r -> "(" ++ show l ++ "|" ++ show r ++ ")"
Cat l r -> "(" ++ show l ++ show r ++ ")"
Star e -> "(" ++ show e ++ ")" ++ "*"
instance Q.Arbitrary E where
= do d <- Q.choose (0,5) :: Q.Gen Int
arbitrary
arbitrary_d d
0 = Q.oneof [ pure A, pure B ]
arbitrary_d = Q.frequency [ (10, pure A)
arbitrary_d d 10, pure B)
, (20, Alt <$> arbitrary_d d' <*> arbitrary_d d')
, (20, Cat <$> arbitrary_d d' <*> arbitrary_d d')
, (20, Star <$> arbitrary_d d')
, (
]where d' = pred d
foo :: IO [E]
= Q.generate Q.arbitrary
foo
prop_test_re2c :: E -> E -> Q.Property
= QM.monadicIO $ do
prop_test_re2c r1 r2 let re_file = unlines [ "/*!re2c"
show r1 ++ " {}"
, show r2 ++ " {}"
, "*/"
,
]<- QM.run $ do BS.writeFile "a.re" $ BS.pack re_file
s1 "re2c -Werror-undefined-control-flow -S a.re -o a.c 2>>re2c_last_warning || exit 42 && gcc a.c -o a && ./a"
SP.system $ s1 `elem` [SE.ExitSuccess, SE.ExitFailure 42]
QM.assert
main :: IO ()
-- main = Q.quickCheck prop_test_re2c
= Q.quickCheckWith Q.stdArgs { Q.maxSuccess = 10000 } prop_test_re2c main
Running the thing is easy:
$ runhaskell qc_re2c.hs
It generates samples similar to the following:
/*!re2c
(((("b")*)*|(("b")*)*))* {}
"b" {}
*/
We can extend fuzzer to support things like "[^abc]"
(set
negation), add more regex clauses and so on.
Found issues so far:
Now computer can write (and trim) code samples for you.
Simple!