fuzzing re2c
My dearest friend maintains re2c (The fast C lexer generator).
Conceptually re2c is a simple tool: it accepts single source file in .re format and generates file in .c format.
It usually works fine but occasionally (mostly in development branches) re2c fails to produce valid output. Extensive test suite is meant to cover past errors and complex lexers seen in the wild but it does not help testing new features and bizarre corner cases.
One day inspired with QuickFuzz I’ve decided to spend 30 minutes to write a fuzzer for re2c.
I’ve started from very simple model: we’ll support only basic set of operations. The full list is:
- “a”
- “b”
- “(” … “)” (brackets)
- “*” (star)
- concatenation
- “|” (alternative)
Haskell community has a wonderful QuickCheck library (paper) to generate random instances of a given datatype. The only thing I need is to put these random instances into the file and run re2c selfvalidation on it. -S (skeleton mode) option generates a validator, not a normal lexer.
-- qc_re2c.hs
{-# LANGUAGE LambdaCase #-}
import qualified Test.QuickCheck as Q
import qualified Test.QuickCheck.Monadic as QM
import qualified System.Random as SR
import qualified System.Process as SP
import qualified System.Exit as SE
import qualified Data.ByteString.Char8 as BS
data E = A
| B
| Alt E E
| Cat E E
| Star E
instance Show E where
show = \case
A -> "\"a\""
B -> "\"b\""
Alt l r -> "(" ++ show l ++ "|" ++ show r ++ ")"
Cat l r -> "(" ++ show l ++ show r ++ ")"
Star e -> "(" ++ show e ++ ")" ++ "*"
instance Q.Arbitrary E where
= do d <- Q.choose (0,5) :: Q.Gen Int
arbitrary
arbitrary_d d
0 = Q.oneof [ pure A, pure B ]
arbitrary_d = Q.frequency [ (10, pure A)
arbitrary_d d 10, pure B)
, (20, Alt <$> arbitrary_d d' <*> arbitrary_d d')
, (20, Cat <$> arbitrary_d d' <*> arbitrary_d d')
, (20, Star <$> arbitrary_d d')
, (
]where d' = pred d
foo :: IO [E]
= Q.generate Q.arbitrary
foo
prop_test_re2c :: E -> E -> Q.Property
= QM.monadicIO $ do
prop_test_re2c r1 r2 let re_file = unlines [ "/*!re2c"
show r1 ++ " {}"
, show r2 ++ " {}"
, "*/"
,
]<- QM.run $ do BS.writeFile "a.re" $ BS.pack re_file
s1 "re2c -Werror-undefined-control-flow -S a.re -o a.c 2>>re2c_last_warning || exit 42 && gcc a.c -o a && ./a"
SP.system $ s1 `elem` [SE.ExitSuccess, SE.ExitFailure 42]
QM.assert
main :: IO ()
-- main = Q.quickCheck prop_test_re2c
= Q.quickCheckWith Q.stdArgs { Q.maxSuccess = 10000 } prop_test_re2c main
Running the thing is easy:
$ runhaskell qc_re2c.hs
It generates samples similar to the following:
/*!re2c
(((("b")*)*|(("b")*)*))* {}
"b" {}
*/
We can extend fuzzer to support things like “[^abc]” (set negation), add more regex clauses and so on.
Found issues so far:
Now computer can write (and trim) code samples for you.
Simple!